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Description 

This application is a continuatbn-in-part of copending application Serial No 07/934,375, filed August 21 . 1 992. 
5 TECHNICAL FIELD OF THE INVENTION 

This invention relates to delivery of biologically active cargo molecules, such as polypeptides and nucleic acids, 
into the cytoplasm and nuclei of cells in vitro and in vivo Intracellular delivery ot cargo molecules according to this 
invention is accomplished by the use of novel transport polypeptides which comprise one or more portions of HIV tat 

10 protein and which are covalently attached to cargo molecules. The transport polypeptides of this Invention are char- 
acterized by the presence of the tat basic region (amino acids 49-57), the absence of the tat cysteine-rich region (amino 
acids 22-36) and the absence of the tat exon 2-encoded carboxy-terminal domain (amino acids 73-86) of the naturally- 
occurring tat protein. By virtue of the absence of the cysteine-rich region found in conventional tat proteins, the transport 
polypeptides of this invention solve the problems of spurious trans-activation and disulfide aggregation. The reduced 

IS size of the transport polypeptides of this invention also minimizes interference with the biological activity of the cargo 
molecule. 

BACKGROUND OF THE INVENTION 

20 Biological cells are generally impermeable to macromolecules. including proteins and nucleic acids. Some small 

molecules enter living cells at very low rates. The lack of means for delivering macromolecules into cells in vivo has 
been an obstacle to the therapeutic, prophylactic and diagnostic use of a potentially large number of proteins and 
nucleic acids having intracellular sites of action. Accordingly, most therapeutic, prophylactic and diagnostic candidates 
produced to date using recombinant DNA technology are polypeptides that act in the extracellular environment or on 

25 the target cell surface. 

N^rious methods have been developed for delivering macromolecules into cells in vitro. A list of such methods 
includes electroporatbn. membrane fusion with liposomes, high velocity bombardment with DNA-coaled microprojec- 
tiles, incubation with calcium-phosphate-DNA precipitate, DEAE-dextran mediated transfection, infection with modified 
viral nucleic acids, and direct micro-injection into single cells. These in vitro methods typically deliver the nucleic acid 

30 molecules into only a fraction of the total cell population, and they tend to damage large numbers of cells. Experimental 
delivery of macromolecules into cells in vivo has been accomplished with scrape loading, calcium phosphate precipi- 
tates and liposomes. However, these techniques have, to date, shown limited usefulness for in vivo cellular delivery. 
Moreover, even with cells in vitro, such methods are of extremely limited usefulness for delivery of proteins. 

General methods for efficient delivery of biologically active proteins into intact cells, in vitro and in vivo, are needed. 

35 (L.A Sternson, "Obstacles to Polypeptide Deliven/". Ann. N.Y Acad. Sci, 57, pp. 19-21 (1987)). Chemical addition of 
a lipopeptide (R Hoffmann et al., "Stimulation of Human and Murine Adherent Cells by Bacterial Lipoprotein and Syn- 
thetic Lipopeptide Anabgues", ImmunobioL, 177. pp. 158-70 (1988)) or a basic polymer such as polylysine or pol- 
yarginine (W-C. Chen et al., "Conjugation of Poly-L-Lysine Albumin and Horseradish Peroxidase: A Novel Method of 
Enhancing the Cellular Uptake of Proteins", Proc. Natl. Acad. Sci, USA. 75, pp. 1872-76 (1978)) have not proved to 

40 be highly reliable or generally useful (see Example 4 infra .). Folic acid has been used as a transport moiety (C.R 
Leamon and Low. Delivery of Macromolecules into Living Cells: A Method That Exploits Folate Receptor Endocytosis". 
Proc. Natl. Acad. Sci USA, 88. pp. 5572-76 (1991)). Evidence was presented for internalization of folate conjugates, 
but not for cytoplasmic delivery. Given the high levels of circulating folate in vivo, the usefulness of this system has not 
been fully demonstrated. Pseudomonas exotoxin has also been used as a transport moiety (Tl. Prior et al., "Barnase 

45 Toxin: A New Chimeric Toxin Composed of Pseudomonas Exotoxin A and Barnase", Cell. 64, pp. 1017-23 (1991)). 
The efficiency and general applicability of this system is not clear from the published work, however 

The tat protein of human immunodeficiency virus type-1 ("HIV") has demonstrated potential for delivery of cargo 
proteins into cells (published PCT applicatbn WO 91/09958). However, given the chemical properties of the full-length 
tat protein, generally applicable methods for its efficient use in delivery of biologically active cargo are not taught in the 

so art. 

Tat is an HIV-encoded protein that transact ivates certain HIV genes and is essential for viral replication. The full- 
length HIV-1 tat protein has 86 amino acid residues. The HIV tat gene has two exons. Tat amino acids 1 -72 are encoded 
by exon 1 . and amino acids 73-86 are encoded by exon 2. The full-length tat protein is characterized by a basic region 
which contains two lysines and six arginines (amino acids 49-57) and a cysteine-rich region which contains seven 
55 cysteine residues (amino acids 22-37). Purified tat protein is taken up from the surrounding medium by human cells 
growing in culture (A.D. Frankel and CO. Pabo. "Cellular Uptake of the Tat Protein from Human Immunodeficiency 
Virus", Cell, 55. pp. 1189-93 (1988)) The art does not teach whether the cysteine-rich regbn of tat protein (which 
causes aggregation and insolubility) is required for cellular uptake of tat protein. 
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PCT patent application WO 91/09958 ("the '958 application") discloses that a heterologous protein consisting of 
amino acids 1-67 of HIV tat protein genetically fused to a papillomavirus E2 trans^activation repressor polypeptide is 
taken up by cultured cells. However, presentation of the cargo polypeptide's biological activity (repression of E2 trans- 
activation) is not demonstrated therein. 

5 The use of tat protein, as taught in the '958 application, potentially involves practical difficulties when used for 

cellular delivery of cargo proteins. Those practical difficulties include protein aggregation and insolubility involving the 
cysteine-rich region of tat protein. Furthermore, the "958 application provides no examples of chemical cross-linking 
of tat protein to cargo proteins, which may be critical in situations where genetic fusion of tat to the cargo protein 
interferes with proper folding of the tat protein, the cargo protein, or both. In addition, both the '958 application and 

10 Frankel and Pabo (supra ) teach the use of tat transport proteins in conjunction with chtoroquine, which is cytotoxic. 
The need exists, therefore, for generally applicable means for safe, efficient delivery of biologically active cargo mol- 
ecules into the cytoplasm and nuclei of living cells. 

SUMMARY OF THE INVENTION 

IS 

This invention solves the problems set forth above by providing processes and products for the efficient cytoplasmic 
and nuclear delivery of biologically active non-tat proteins, nucleic acids and other nnolecules that are (1 ) not inherently 
capable of entering target cells or cell nuclei, or (2) not inherently capable of entering target cells at a useful rate. 
Intracellular delivery of cargo molecules according to this invention is accomplished by the use of novel transport 
20 proteins which comprise one or more portions of HIV tat protein and whk;h are covalently attached to the cargo mol- 
ecules. More particularly this invention relates to novel transport polypeptides, methods for making those transport 
polypeptides, transport polypeptide-cargo conjugates, pharmaceutical, prophylactic and diagnostic compositions com- 
prising transport polypeptide-cargo conjugates and methods for delivery of cargo into cells by means of tat-related 
transport polypeptides. 

25 The transport polypeptides of this invention are characterized by the presence of the tat basic region amino acid 

sequence (amino acids 49-57 of naturally-occurring tat protein); the absence of the tat cysteine-rich region amino acid 
sequence (amino acids 22-36 of naturally-occurring tat protein) and the absence of the tat exon 2-encoded carboxy- 
terminal domain (amino acids 73 86 of naturally-occurring tat protein). Preferred embodiments of such transport 
polypeptides are: tat37-72 (SEQ ID NO:2). tat37-58 (SEQ ID NO:3). tat38-58GGC (SEQ ID NO:4), tatCGG47-58 (SEQ 
30 ID NO:5) tat47-5BGGC (SEQ ID NO:6), and tatAcys (SEQ ID NO:7). It will be recognized by those of ordinary skill in 
the art that when the transport polypeptide is genetically fused to the cargo moiety, an amino-terminal methionine must 
be added, but the spacer amino acids (e.g., CysGlyGly or GlyGlyCys) need not be added. By virtue of the absence of 
the cysteine-rich region present in conventional tat proteins, transport polypeptides of this invention solve the problem 
of disulfide aggregation, which can result in loss of the cargo's biological activity, insolubility of the transport polypeptide- 
's cargo conjugate, or both. The reduced size of the transport polypeptides of this invention also advantageously mini- 
- mizes interference with the biological activity of the cargo. A further advantage of the reduced transport polypeptide 
size is enhanced uptake efficiency in embodiments of this invention involving attachment of multiple transport polypep- 
tides per cargo molecule. 

Transport polypeptides of this invention may be advantageously attached to cargo molecules by chemical cross- 
40 linking or by genetic fusion. According to preferred embodiments of this invention, the transport polypeptide and the 
cargo molecule are chemically cross-linked. A unique terminal cysteine residue is a preferred means of chemical cross- 
linking. According to other preferred embodiments of this invention, the cart»oxy terminus of the transport moiety is 
genetically fused to the amino terminus of the cargo moiety. A particularly preferred embodiment of the present invention 
is JB106, which consists of an amino-terminal methionine followed by tat residues 47-58, followed by HPV-16 E2 
45 residues 245-365. 

, In many cases, the novel transport polypeptides of this invention advantageously avoid chloroquine-associated 
toxicity According to one preferred embodiment of this invention, a biologically active cargo is delivered into the cells 
of various organs and tissues following introduction of a transport polypeptide-cargo conjugate into a live human or 
animal. By virtue of the foregoing features, this invention opens the way for biotogical research and disease therapy 
so involving proteins, nucleic acids and other molecules with cytoplasmic or nuclear sites of action. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 depicts the amino acid sequence of HIV-1 tat protein (SEQ ID NO:1). 
55 Figure 2 summarizes the results of cellular uptake experiments with transport polypeptide-Pseudomonas exotoxin 

ribosylation domain conjugates (shaded bars, unconjugated; diagonally-hatched bars, conjugated). 

Figure 3 summarizes the results of cellular uptake experiments with transport polypeptide-nbonuclease conjugates 
(closed squares, ribonuclease-SMGG without transport moiety; ctosed circles, tat37-72-ribonuclease; closed triangles 
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tat38-58GGC-ribonuclease; closed diamonds, tatCGG38-58-ribonuclease; open squares, tatCGG47-58-ribonucle- 
ase). 

Figure 4 schematically depicts the construction of plasmid pAHE2. 
Figure 5 schematically depicts the construction of plasmid pET8c123. 
5 Figure 6 schematically depicts the construction of plasmid pETSd 23CCSS. 

Figure 7 summarizes the results of cellular uptake experiments with transport polypeptide-E2 repressor conjugates 
(open diamonds, E2.123 cross-linked to tat37-72, without chloroquine; closed diamonds, E2.123 cross-linked to 
tat37-72, with chloroquine; open circles, E2.123CCSS cross-linked to tat37-72. without chloroquine; closed circles, 
E2.123CCSS cross-linked to tat37-72, with chloroquine). 
10 Figure 8 schematically depicts the construction of plasmid pTATAcys. 

Figure 9 schematically depicts the construction of plasmid pFTESOl . 
Figure 10 schematically depicts the construction of plasmid pTATAcys-249- 
Figure 11 schematically depicts the construction of plasmid pJB106. 
Figure 1 2 depicts the complete amino acid sequence of protein JB106. 
15 Figure 13 summarizes the results of E2 repression assays involving JB106 (squares), TxHE2CCSS (diamonds) 

and hiE2.123 (circles). The assays were carried out in COS7 cells, without chloroquine, as described in Example 14. 

DETAILED DESCRIPTION OF THE INVENTION 

20 In order that the invention herein described may be more fully understood, the following detailed descriptbn is set 

forth. 

In the description, the following terms are employed: 

Amino acid - A monomeric unit of a peptide, polypeptide or protein. The twenty protein amino acids (L-isomers) 
are: alanine ("Ala" or "A"), arginine ("Arg" or "R"). asparagine ("Asn" or "N"), aspartic acid ("Asp" or "D"), cysteine ("Cys" 
25 or "C"), glutamine ("Gin" or "Q"), glutamic acid ("Glu" or "E"), glycine ("Gly" or "G"), histidine ("His" or "H"). isoleucine 
("lie" or "I"), leucine ("Leu" or "L"), lysine ("Lys" or "K"), methionine ("Met" or "M"), phenylalanine ("Phe" or "F"), proline 
("Pro" or "P"), serine ("Ser" or "S"). threonine ("Thr" or "T"), tryptophan ("Trp" or "W"), tyrosine ("Tyr" or "Y") and valine 
("V^l" or "V"). The term amino acid, as used herein, also includes analogs of the protein amino acids, and D-isomers 
of the protein amino acids and their analogs 
30 . Cargo A molecule that is not a tat protein or a fragment thereof, and that is either (1 ) not inherently capable of 
entering target cells, or (2) not iriherently capable of entering target cells at a useful rate. ("Cargo", as used in this 
application, refers either to a molecule, per se, i.e., before conjugatkxi, or to the cargo moiety of a transport polypeptide- 
cargo conjugate ) Examples of "cargo" include, but are not limited to, small molecules and macromolecules. such as 
polypeptides, nucleic acids and polysaccharides. 
35 Chemical cross-linking - Covalenl bonding of two or more pre-formed molecules. 

Cargo conjugate - A molecule comprising at least one transport polypeptide moiety and at least one cargo moiety 
formed either through genetic fusion or chemical cross-linking of a transport polypeptide and a cargo molecule. 

Genetic fusion - Co linear, covalent linkage of two or more proteins via their polypeptide backbones, through 
genetic expression of contiguous DNA sequences encoding the proteins. 
40 Macromolecule - A molecule, such as a peptide, polypeptide; protein or nucleic acid. 

PolypeptkJe ~ Any polymer consisting essentially of any of the 20 protein amino acids (above), regardless of its 
size. Although "protein" is often used in reference to relatively large polypeptides, and "peptide" is often used in refer- 
ence to small polypeptides, usage of these terms in the art overlaps and varies. The term "polypeptide" as used herein 
refers to peptides, polypeptides and proteins, unless otherwise noted. 
45 Reporter gene - A gene the expression of which depends on the occurrence of a cellular event of interest, and 

the expression of which can be conveniently obsen/ed in a genetically transformed host cell. 
Reporter plasmid ~ A plasmid vector comprising one or more reporter genes. 
Small molecule - A molecule other than a macromolecule. 

Spacer amino acid ~ An amino acid (preferably having a small side chain) included between a transport moiety 
50 and an amino acid residue used for chemical cross-linking (e.g., to provide molecular flexibility and avoid steric hin- 
drance). 

Target cell - A cell into which a cargo is delivered by a transport polypeptide. A target cell" may be any cell, 
including human cells, either in vivo or in vitro. 

Transport moiety or transport polypeptide - A polypeptide capable of delivering a covalently attached cargo into 
55 a target cell. 

This invention is generally applicable for therapeutic, prophylactic or diagnostic intracellular delivery of small mol- 
ecules and macromolecules, such as proteins, nucleic acids and polysaccharides, that are not inherently capable of 
entering target cells at a useful rate. It should be appreciated, however, that alternate embodiments of this invention 
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are not limited to clinical applications. This invention may be advantageously applied in medical and biological research. 
In research applications of this invention, the cargo may be a drug or a reporter molecule. Transport polypeptides of 
this invention may be used as research laboratory reagents, either alone or as part of a transport polypeptide conju- 
gation kit. 

5 The target cells may be in vivo cells, i.e., cells composing the organs or tissues of living animals or humans, or 

microorganisms found in living animals or humans. The target cells may also be in vitro cells, i.e., cultured animal cells, 
human cells or microorganisms. 

Wide latitude exists in the selection of drugs and reporter molecules for use in the practice of this invention. Factors 
to be considered in selecting reporter molecules include, but are not limited to, the type of experimental information 

10 . sought, non-toxicrty, convenience of detection, quantifiability of detectbn, and availability. Many such reporter mole- 
cules are known to those skilled in the art. 

As will be appreciated from the examples presented below, we have used enzymes for which colorrmetric assays 
exist, as model cargo to demonstrate the operability and useful features of the transport polypeptides of this invention. 
These enzyme cargos provide for sensitive, convenient, visual detection of cellular uptake. Furthermore, since visual 

IS readout occurs only if the enzymatic activity of the cargo is preserved, these enzymes provide a sensitive and reliable 
test for presen/ation of biological activity of the cargo moiety in transport polypeptide-cargo conjugates according to 
this invention. A preferred embodiment of this invention comprises horseradish peroxidase ("HRP") as the cargo moiety 
of the transport polypeptide-cargo conjugate. A particularly preferred model cargo moiety for practice of this inventbn 
is p-gatactosidase. 

20 Model cargo proteins may also be selected according to their site of actbn within the cell. As described in Examples 

6 and 7, below, we have used the ADP ribosylation domain from Pseudomonas exotoxin ("PE") and pancreatic ribo- 
nuclease to confirm cytoplasmic delivery of a properly folded cargo proteins by transport polypeptides according to 
this invention. 

Full-length Pseudomonas exotoxin is itself capable of entering cells, where it inactivates ribosomes by means of 

25 an ADP ribosylatbn reaction, thus killing the cells. A portion of the Pseudomonas exotoxin protein known as the ADP 
ribosylation domain is incapable of entering cells, but it retains the ability to inactivate ribosomes if brought into contact 
with them. Thus, cell death induced by transport polypeptide-PE ADR ribosylation domain conjugates is a test for 
cytoplasmic delivery of the cargo by the transport polypeptide. 

We have also used ribonuclease to confirm cytoplasmic delivery of a properly folded cargo protein by transport 

30 polypeptides of this invention. Protein synthesis, an RNA-dependent process, is highly sensitive to ribonuclease, which 
digests RNA Ribonuclease is, by itself, incapable of entering cells, however. Thus, inhibition of protein synthesis by 
a transport polypeptide-ribonuclease conjugate is a test for intracellular delivery of biologbally active ribonuclease. 

Of course, delivery of a given cargo molecule to the cytoplasm may be followed by further delivery of the same 
cargo molecule to the nucleus. Nuclear delivery necessarily involves traversing some portion of the cytoplasm. 
. 35 . Papillomavirus E2 repressor proteins are examples of macromolecular drugs that may be delivered into the nuclei 
of target cells by the transport polypeptides of this invention. Papillomavirus E2 protein, which normally exists as a 
homodimer, regulates both transcription and replication of the papillomavirus genome. The carboxy4erminal domain 
of the E2 protein contains DNA binding and dimerization activities. Transient expression of DNA sequences encoding 
various E2 analogs or E2 carboxy -terminal fragments in transfecled mammalian cells inhibits trans-activation by the 

40 full-length E2 protein (J. Barsoum et aL. "Mechanism of Action of the Papillomavirus E2 Repressor: Repression in the 
Absence of DNA Binding". J. Virol. . 66. pp. 3941-3945 (1 992)). E2 repressors added to the growth medium of cultured 
mammalian cells do not enter the cells, and thus do not inhibit E2 trans-activatbn in those cells. However, conjugatbn 
of the transport polypeptides of this invention to E2 repressors results in translocation of the E2 repressors from the 
growth medium into the cultured cells, where they display biological activity, repressing E2-dependent expression of 

45 a reporter gene. 

The rate at which single-stranded and double-stranded nucleic acids enter cells, in vitro and in vivo , may be ad- 
vantageously enhanced, using the transport polypeptides of this invention. As shown in Example 11 (below), methods 
for chemical cross-linking of polypeptides to nucleic acids are well known in the art. In a preferred embodiment of this 
invention, the cargo is a single-stranded antisense nucleic acid. Antisense nucleic acids are useful for inhibiting cellular 

50 expression of sequences to which they are complementary. In another embodiment of this invention, the cargo is a 
double-stranded nucleic acid comprising a binding site recognized by a nucleic acid-binding protein An example of 
such a nucleic acid-binding protein is a viral trans-activator. 

Naturally-occurring HIV-1 tat protein (Figure 1) has a regbn (amino acids 22-37) wherein 7 out of 16 amino acids 
are cysteine. Those cysteine residues are capable of forming disulfide bonds with each other, with cysteine residues 

55 in the cysleine-rich region of other tat protein molecules and with cysteine residues in a cargo protein or the cargo 
moiety of a conjugate. Such disulfide bond formation can cause loss of the cargo's biological activity Furthermore, 
even if there is no potential for disulfide bonding to the cargo moiety (for example, when the cargo protein has no 
cysteine residues), disulfide bond formation between transport polypeptides leads to aggregation and insolubility of 
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the transport polypeptide, the transport polypeptide-cargo conjugate, or both. The tat cysteine-rich region is potentially 
a source of serious problems in the use of naturally-occurring tat protein for cellular delivery of cargo molecules. 

The cysteine-rich region is required for dimerization of tat in vitro, and is required for trans-activation of HIV DNA 
sequences. Therefore, removal of the tat cysteine-rich region has the additional advantage of eliminating the natural 
5 activity of tat, i.e., induction of HIV transcription and replicatbn. However, the art does not teach whether the cysteine- 
rich region of the tat protein is required for cellular uptake. 

The present invention includes embodiments wherein the problems associated with the tat cysteine-rich region 
are solved, because that region is not present in the transport polypeptides described herein. In those embodiments, 
cellular uptake of the transport polypeptide or transport polypeptide-cargo molecule conjugate still occurs. In one group 
10 of preferred embodiments of this inventbn, the sequence of amino acids preceding the cysteine-rich region is fused 
directly to the sequence of amino acids following the cysteine-rich region. Such transport polypeptides are called tat- 
Acys, and have the general formula (tat1-21)-(tat38-n), where n is the number of the carboxy-terminal residue, i.e., 
49-86. Preferably, n is 58-72. As will be appreciated from the examples below, the amino acid sequence preceding the 
cysteine-rich regbn of the tat protein is not required for cellular uptake. A preferred transport polypeptide (or transport 
IS moiety) consists of amino acids 37-72 of tat protein, and is called tat37-72 (SEQ ID NO:2). Retention of tat residue 37, 
a cysteine, at the amino terminus of the transport polypeptide is preferred, because it is useful for chemical cross-linking. 

The advantages of the tatAcys polypeptides, tat37-72 and other embodiments of this invention include the following: 

a) The natural activity of tat protein, i.e., induction of HIV transcription, is eliminated; 
20 b) Dimers. and higher multimers of the transport polypeptide are avoided; 

c) The level of expression of tatAcys genetic fusions in E.coli may be improved; 

d) Some transport polypeptide conjugates display increased solubility and superior ease of handling; and 

e) Some fusion proteins display Increased activity by the cargo moiety, as compared with fusions containing the 
cysteine-rich region. 

25 

Numerous chemical cross-linking methods are known and potentially applicable for conjugating the transport 
polypeptides of this invention to cargo macromolecules. Many known chemical cross-linking methods are non-specific, 
i.e., they do not direct the point of coupling to any particular site on the transport polypeptide or cargo macromolecule. 
As a result, use of non-specific cross-linking agents may attack functional sites or sterically block active sites, rendering 
30 the conjugated proteins bblogically Inactive. 

A preferred approach to increasing coupling specificity in the practice of this invention is direct chemical coupling 
to a functbnal group found only once or a few times in one or both of the polypeptides to be cross-linked. For example, 
in many proteins, cysteine, which is the only protein amino acid containing a thiol group, occurs only a few times. Also, 
for example, if a polypeptide contains no lysine residues, a cross-iinking reagent specific for primary amines will be 
35 selective for the amino terminus of that polypeptide. Successful utilization of this approach to increase coupling spe- 
cificity requires that the polypeptide have the suitably rare and reactive residues in areas of the molecule that may be 
altered without loss of the molecule's biological activity. 

As demonstrated in the examples below, cysteine residues may be replaced when they occur in parts of a polypep- 
tide sequence where their participation in a cross-linking reaction would likely interfere with biological activity. When 
40 a cysteine residue is replaced, it is typically desirable to minimize resulting changes in polypeptide folding. Changes 
in polypeptide folding are minimized when the replacement is chemically and sterically similar to cysteine. For these 
reasons, serine is preferred as a replacement for cysteine. As demonstrated in the examples below, a cysteine residue 
may be introduced into a polypeptide's amino acid sequence for cross-linking purposes. When a cysteine residue is 
introduced, introduction at or near the amino or cartx>xy terminus is preferred. Conventional methods are available for 
45 such amino acid sequence modificatbns, whether the polypeptide of interest Is produced by chemical synthesis or 
expression of recombinant DNA. 

Cross-linking reagents may be homobifunctional, i.e., having two functional groups that undergo the same reaction. 
A preferred homobifunctional cross-linking reagent is bismaleimldohexane ("BMH"). BMH contains two maleimide func- 
tional groups, which react specifically with sulfhydryl-containing compounds under mild conditions (pH 6.5-7.7). The 
so two maleimide groups are connected by a hydrocarbon chain. Therefore, BMH Is useful for Irreversible cross-tinking 
of polypeptides that contain cysteine residues. 

Cross-linking reagents may also be heterobif unctlonal. Heteroblfunctional cross-linking agents have two different 
functional groups, for example an amine-reactive group and a thiol-reactive group, that will cross-link two proteins 
having free amines and thiols, respectively Examples of heteroblfunctional cross-linking agents are succinlmidyl 4-(N- 
ss maleimldomelhyl)cyclohexane-1 -carboxylate ("SMCC"), m-maleimidobenzoyl-N-hydroxysuccinlmide ester CMBS"), 
and succinimide 4-(p-maleimidophenyl)butyrate ^SMPB"), an extended chain analog of MBS. The succinlmidyl group 
of these crosslinkers reacts with a primary amine, and the thiol-reactive maleimide, forms a covalent bond with the 
thiol of a cysteine residue. 
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Cross-linking reagents often have fow solubility in water. A hydrophilic moiety, such as a sulfonate group, may be 
added to the cross-linking reagent to improve its water solubility. Sulfo-MBS and sulfo-SMCC are examples of cross- 
Imking reagents modified for water solubility. 

Many cross-linking reagents yield a conjugate that is essentially non-cleavable under cellular conditions. However, 
s some cross-linking reagents contain a covalent bond, such as a disulfide, that is cleavable under cellular conditions. 
For example, dilhio6is(succinimidylpropionate) CDSP-). Traufs reagent and N-succlnlmldyl 3-(2-pyridyldithio) propi- 
onate CSPDP") are well-known cleavable crossllnkers. The use of a cleavable cross-linking reagent permits the cargo 
moiety to separate from the transport polypeptide after delivery into the target cell. Direct disulfide linkage may also 

w "some new cross-linking reagents such as n-y-maleimkJobutyrykJxy-succinimide ester CGMBS') and sulfo^M^. 
have reduced immunogenicity. In some embodiments of the present invention, such reduced immunogenicity may be 

advantageous. . , i j • 

Numerous cross-linking reagents, including the ones discussed above, are commercially available. Dela. ed in- 
structions for their use are readily available from the commercial suppliers. A general reference on protein cross-linking 

IS and conjugate preparation is: S.S. Wong, Chemlstn/ of Prot ein Coniuaatton and Cross-Linking. CRC Pre^ (1991). 

Chemical cross-linking may Include ttie use of spacer arms. Spacer arms provide intramolecular flexibility or adjust 
intramolecular distances between conjugated moieties and thereby may help preserve biotogical activity. A spacer arm 
may be in the form of a polypeptkJe moiety comprising spacer amino acids. Alternatively, a spacer ami may be part of 
the ciDss-linking reagent, such as in 'longKihain SPDP- (Pierce Chem. Co.. Rocklord. IL. cat. No. 21651 H) 

20 The pharniaceutwal compositfons of this invention may be for therapeutk:, prophylactic or diagnostic applications, 

and may be in a variety of forms These include, for example, solid, semi-solid, and Iquki dosage forms, such as 
tablets pills powders, liquid solutions or suspensions, aerosols, liposomes, suppositories, injectable and infusible 
solutens and sustained release forms. The preferred form depends on the Intended mode of administration and the 
therapeutK, prophylactic or diagnostic application. The transport polypeptide^^rgo molecule conjugates of this inven- 

2S tion may be administered by conventional routes of administiatton, such as parenteral, subcutaneous, intravenous, 
intramuscular, Intraleslonal or aerosol routes. The compositkxis also preferably include conventkxial phamnaceutically 
acceptable carriers and adjuvants that are known to those of skill in the art. 

Generally, the pharmaceutical compositions of the present Invention may be formulated and administered using 
methods and compositkxis similar to those used for phamnaceutlcally important polypeptides such as. for example. 

30 alpha Interferon. It will be understood that conventional doses will vary depending upon the particular cargo involved. 

The processes and compositions of this invention may be applied to any organism, including humans The proc- 
esses and compositions of this invention may also be applied to animals and humans in utera 

For many pharmaceutical applications of this Invention, it is necessary for the cargo molecule to be translocated 
from body fluids into cells of tissues in the body, rather than from a growth medium into cultured cells. Therefore, in 

35 additbn to examples betow involving cultured cells, we have provided examples demonstrating delivery of model cargo 
proteins into cells of various mammalian organs and Ussues. foltowing intravenous Injection of transport polypeptide- 
Lrgo protein conjugates into live animals. These cargo proteins display bkitogical activity following deliveo^ into the 

''^ ^MdSonstrated in the examples that follow using the aminoacid and DMA sequence information provided herein, 
40 the transport polypeptides of this Invention may be chemically synthesized or produced by recombinant DNA methods. 
Methods for chemk^al synthesis or recombinant DNA production of polypeptides having a known amino acid sequence 
are well known Automated equipment for polypeptide or DNA synthesis is commercially available. Host cells, cloning 
vectors DNA expression control sequences and oligonucleotkJe linkers are also commercially available. 

Using well-known techniques, one of skill in the art can readiV make minor additions, deletions or substitutions in 
45 ihe preferred transport polypeptide amino acid sequences set forth herein. It shouW be understood, however, that such 
variations are within the scope of this Invention. 

Furthermore, tat proteins from other viruses, such as HIV-2 (M. Guyader et al., "Genome Organization and Trans- 
activation of the Human Immunodefciency Vims Type 2". Nature. 326. pp. 662-669 (1987)). equine infectious anemia 
virus (R Carroll et al.. -Identification ol Lentivirus Tat Functional Domains Through Generation of Equine Infectious 
so Anemia Virus/Human Immunodeficiency Virus Type 1 tat Gene Chimeras". JJflroL. 65. pp. 3460-67 (1991)), andsimian 
immunodeficiency virus (L. Chakrabarti et al., "Sequence of Simian Immunodefciency Virus from Macaque and Its 
Relationship to Other Human and Simian Retroviruses". Mlire. 328, pp. 543-47 (1 987); S.K Arya et al^^^^^^ 

and Simian HIV-Related Retroviruses Possess Functional Transactivator (tat) Gene". Nature. 328. pp. 548-550 (1987)) 
areknown it shoukJ be understood that polypeptides derived from those tat proteins and characterized by the presence 
55 of the tat basic region and the absence of the tat cysteine-rich region fall within the scope of the present invention^ 

In order that the invention described herein may be more fully understood, the following examples are set orth. It 
should be understood that these examples are for illustrative purposes only and are not lo be construed as limrting 
this inventton in any manner Throughout these examples, all molecular cloning reactions were carried out according 
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to methods in J Sambrook et al , Molecular Cloning: A l-aix)ratorv Manual. 2nd Edition . Cold Spring Harbor Laboratory 
(1989), except where otherwise noted. 

EXAMPLE 1 

5 

Production and Purification of Transport Polypeptides 
Recombinant DNA 

10 Plasmid pTat72 was a starting clone for bacterial production of tat-derived transport polypeptides and construction 

of genes encoding transport polypeptide-cargo protein fusions. We obtained plasmid pTat72 (described in Frankel and 
Pabo, supra ) trom Alan Frankel (The Whitehead Institute for Biomedical Research. Cambridge, MA). Plasmid pTat72, 
was derived from the pET-3a expression vector of F. W. Studier et al. ("Use of T7 RNA Polymerase to Direct Expression 
of Cloned Genes", Methods Enzymol. . 185, pp. 60-90 (1990)) by insertion of a synthetic gene encoding amino acids 

15 1 to 72 of HI V-1 tat. The tat coding regbn emptoys E.coli codon usage and is driven by the bacteriophage T7 polymerase 
promoter inducible with isopropyl beta-D-thiogalactopyranoside ("I PTG"). Tat protein constituted 5% of total E.coll pro- 
tein after I PTG induction. 

Purification of Tati -72 from Bacteria 

20 

We suspended E coli expressing tat1 -72 protein in 1 0 volumes of 25 mM Tris-HCI (pH 7.5), 1 mM EDTA. We lysed 
the cells in a French press and removed the insoluble debris by centrif ugation at 1 0.000 x g for 1 hour. We loaded the 
supematant onto a Q Sepharose Fast Flow (Pharmacia LKB. Piscataway, NJ) ion exchange column (20 ml resin/60 
ml lysate). We treated the flow-through fraction with 0.5 M NaCI, which caused the tat protein to precipitate. We collected 

25 the salt-precipitated protein by centrifugation at 35.000 rpm. in a 50.2 rotor, for 1 hour. We dissolved the pelleted 
precipitate in 6 M guanidine-HCI and clarified the solution by centrifugation at 35.000 rpm, in a 50.2 rotor, for 1 hour. 
We loaded the clarified sample onto an A.5 agarose gel filtration column equilibrated with 6 M guanidine-HCI, 50 rnM 
sodium phosphate (pH 5.4), 10 mM DTT, and then eluted the sample with the same buffer. We loaded the tat protein- 
contain gel filtration fractions onto a C4 reverse phase HPLC column and eluted with a gradient of 0-75% acetonitrile, 

30 0.1% trifluoroacetic acid. Using this procedure, we produced about 20 mg of tati -72 protein per liter of E coli culture 
(assuming 6 g of cells per liter). This represented an overall yield of about 50%. 

Upon SDS-PAGE analysis, the tat1-72 polypeptide migrated as a single band of 10 kD. The purified tati -72 
polypeptide was active in an uptake/transactivation assay. We added the polypeptide to the culture medium of human 
hepatoma cells containing a tat-responsive tissue plasminogen activator (IRA") reporter gene. In the presence of 0.1 

35 mM chloroquine, the purified tati -72 protein (100 ng/ml) induced tPA expression approximately 150-fold. 

Chemical Synthesis of Transport Polypeptides 

For chemical synthesis of the various transport polypeptides, we used a commercially-available, automated system 
40 (Applied Biosystems Model 430A synthesizer) and followed the system manufacturer's recommended procedures. We 
removed blocking groups by HF treatment and isolated the synthetic polypeptides by conventional reverse phase HPLC 
methods. The integrity of all synthetic polypeptides was confirmed by mass spectrometer analysis. 

EXAMPLE 2 

45 

B-Galactosidase Conjugates 
Chemical Cross-Linking with SMCC 

50 For acetylation of P galactosidase (to block cysteine sulfhydryl groups) we dissolved 6.4 mg of commercially ob- 

tained p-galactosidase (Pierce Chem. Co , cat. no. 32101 G) in 200 \i\ of 50 mM phosphate buffer (pH 7.5) To the 200 
111 of p-galactosidase solutton, we added 10 pi of iodoacetic acid, prepared by dissolving 30 mg of iodoacetic acid in 4 
ml of 50 mM phosphate buffer (pH 7.5). (In subsequent experiments we found iodoacetamide to be a preferable sub- 
stitute for bdoacetic acid.) We allowed the reaction to proceed for 60 minutes at room temperature. We then separated 

55 the acetylated p-galactosidase from tfie unreacted iodoacetic acid by loading the reaction (Pharmacia) mixture on a 
small G-25 (Pharmacia LKB, Piscataway NJ) gel filtration column and collecting the void volume. 

Prior to SMCC activation of the amine groups of the acetylated p-ga!actosidase, we concentrated 2 ml of the 
enzyme collected from the G-25 column to 0.3 ml in a Centricon 10 (Amicon, Danvers, MA) ultrafiltration apparatus. 
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To the concentrated acetylated P galactosidase, we added 1 9 of sulfo-SMCC (Pierce Chem. Co., cat no. 22322G) 
dissolved in 15 |il of dimethylfornr^amide ("DMF"). We allowed the reaction to proceed for 30 minutes at room temper- 
ature We then separated the p-ga!actosldase-SMCC from the DMF and unreacted SMCC by passage over a small 
G-25 gel filtration column. 

5 For chemical cross-linking of transport polypeptides to p-galactosidase, we mixed the solution of p-galactosidase- 

SMCC with 1 00 ^ig of transport polypeptide (tat1 -72, tat37-72, tat38-58GGC. tat37-58. tat47-58GGC or tatCGG47-58) 
dissolved in 200 ^il of 50 mM phosphate buffer (pH 7.5). We allowed the reaction to proceed for 60 minutes at room 
temperature. We then isolated the transport polypeptide-p-galactosidase conjugate by loading the reaction mixture on 
an S-200HR gel filtration column and collecting the void volume. 

10 The transport polypeptide-p-galactosidase conjugate thus obtained yielded positive results when assayed for tat 

in conventional Western blot and ELISA analyses performed with rabbit anti-tat polyclonal antibodies For a general 
discussion of Western blot and ELISA analysis, see E. Harlow and D. Lane, Antibodies: A Laboratory Manual. Cold 
Spring Harbor Laboratory (1988). Gel filtration analysis with Superose 6 (Pharmacia LKB, Piscataway, NJ) indicated 
the transport polypeptide-p galactosidase conjugate to have a molecular weight of about 540,000 dattons. Specific 

IS activity of the transport polypeptide-p-galactosidase conjugate was 52% of the specific activity of the P galactosidase 
starting material, when assayed with o-nitrophenyl-p-D-galactopyranoside ("ONPG"). The ONPG assay procedure is 
described in detail at pages 16.66-16.67 of Sambrook et al. (supra) . 

Cellular Uptake of ^-Galactosidase Conjugates 

20 

We added the conjugates to the medium of HeLa cells (ATCC no. CCl_2) at 20 |xg/ml. in the presence or absence 
of 100 |iM chloroquine. We Incubated the cells for 4-18 hours at 37**C/5.5% COg. We fixed the cells with 2% formal- 
dehyde, 0,2% glutaraldehyde in phosphate-buffered saline ("PBS") for 5 minutes at 4'*C. We then washed the cells 
three times with 2 mM MgCl2 in PBS, and stained them with X-gal, at 37°C. X-gal is a colorless p galactosidase substrate 

2S (5-bromo-4-chloro-3-indolyl D-galactoside) that yields a blue product upon cleavage by p-galactosidase. Our X-gal 
staining solution contained 1 mg of X-gal (Bio-Rad, Richmond. CA, cat. no. 170-3455) per ml of PBS containing 5 mM 
potassium ferricyanide, 5 mM potassium ferrocyanide and 2 mM MgCl2. 

We subjected the stained celts to microscopic examination at magnifications up to 400 X. Such microscopic ex- 
amination revealed nuclear staining, as well as cytoplasmic staining. 

30 The cells to which the tat37-72-p-galactosidase conjugate or tat1 -72-p-galactosidase conjugate was added stained 

dark blue, p-galactosidase activity could be seen after a development lime as short as 15 minutes. For comparison, it 
should be noted that stain development time of at least 6 hours is normally required when p-galactosidase activity is 
introduced into cells by means of transfection of the p-galactosidase gene. Nuclear staining was visible in the absence 
of chloroquine, although the nuclear staining intensity was slightly greater in chloroquine-treated cells. Control cells 

35 treated with unconjugated p-galactosidase showed no detectable staining. 

Cleavabie Conjugation by Direct Disulfide 

Each p-galactosidase tetramer has 12 cysteine residues that may be used for direct disulfide linkage to a transport 
40 polypeptide cysteine residue. To reduce and then protect the sulfhydryl of tat37-72, we dissolved 1 .8 mg (411 nmoles) 
of tat37-72 in 1 ml of 50 mM sodium phosphate (pH 8.0), 150 mM NaCI, 2mM EDTA. and applied the solution to a 
Reduce-Imm column (Pierce Chem. Co.. Rockford. IL). After 30 minutes at room temperature, we eluted the tat37-72 
from the column with 1 ml aliquots of the same buffer, into tubes ccritainingO. 1 ml of 10 mM 5,5'-dithio-bis(2-nitrobenzoic 
acid) ("DTNB"). We left the reduced tat37-72 polypeptide in the presence of the DTNB for 3 hours. We then removed 
45 the unreacted DTNB from the tat37-72-TNB by ge! filtration on a 9 ml Sephadex G-10 column (Pharmacia LKB, Pis- 
cataway. NJ). We dissolved 5 mg p-galactosidase in 0.5 ml of buffer and desalted it on a 9 ml Sephadex G-25 column 
(Pharmacia LKB, Piscataway. NJ), to obtain 3.8 mg of p-galactosidase/ml buffer. We mixed 0.5 ml aliquots of desalted 
P-ga!actosidase solution with 0.25 or 0.5 ml of the tat37-72-TNB preparation, and allowed the direct disulfide cross- 
linking reaction to proceed at room temperature for 30 minutes. We removed the unreacted tat37-72-TNB from the p- 
50 galactosidase conjugate by gel filtration on a 9 ml Sephacryl S-200 column. We monitored the extent of the cross- 
linking reaction indirectly, by measuring absorbance at 41 2 nm due to the released TNB. The direct disulfide conjugates 
thus produced were taken up into ceils (data not shown). 

CleavabiG Coniuqation with SPDP 

55 

We used the heterobifunctional cross-linking reagent ("SPDP"), which contains a cleavabie disulfide bond, to form 
a cross-link between: (1) the primary amine groups of p-galactosidase and the cysteine sulfhydryls of lat1-72 (meta- 
bolically labelled with ^S) ; or (2) the primary amine groups of rhodamine-labelled p-galactosidase and the amino 
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terminal cysteine sulthydryl of tal37-72. 

For the tat 1-72 conjugation, we dissolved 5 mg of p-galactosidase in 0.5 ml of 50 mM sodium phosphate (pH 7.5), 
150 mM NaCI, 2 mM MgClg, and desalted the p galactosidase on a 9 ml Sephadex G-25 column (Pharmacia LKB, 
Piscataway, NJ). We treated the desalted p-galactosidase with an 88-fold molar excess of iodoacetamide at room 

s temperature for 2 hours, to block free sulfhydryl groups. After removing the unreacted iodoacetamide by gel filtration, 
we treated the blocked p-galactosidase with a 10-fold molar excess of SPDP at room temperature. After 2 hours, we 
exchanged the buffer, by ultrafiltration (Ultrafree 30, Millipore, Bedford, MA). We then added a 4-fold molar excess of 
labelled tat1-72, and allowed the cross-linking reaction to proceed overnight, at room temperature. We removed the 
unreacted tati -72 by gel filtration on a 9 ml Sephacryl S-200 column. Using the known specific activity of the labelled 

10 tat1-72, we calculated that there were 1.1 tati -72 polypeptides cross-linked per p-galactosidase tetramer. Using the 
ONPG assay, we found that the conjugated p-galactosidase retained 100% of its enzymatic activity. Using measure- 
ment of cell-incorporated radioactivity and X-gat staining, we demonstrated uptake of the conjugate into cultured HeLa 
cells. 

For the tat37-72 conjugation, our procedure was as described in the preceding paragraph, except that we labelled 
IS the p-galactosidase with a 5:1 molar ratio of rhodamine maleimide at room temperature for 1 hour, prior to the iodoa- 
cetamide treatment (100:1 iodoacetamide molar excess). In the cross-linking reaction, we used an SPDP ratio of 20: 
1 . and a tat 37 -72 ratio of 10:1 . We estimated the conjugated product to have about 5 rhodamine molecules (according 
to UV absorbance) and about 2 tat37-72 moieties (according to gel filtration) per p-galactosidase tetramer. The con- 
jugate from this procedure retained about 35% of the initial p-galactosidase enzymatic activity. Using X-gal staining 
20 and rhodamine fluorescence, we demonstrated that the SPDP conjugate was taken up into cultured HeLa cells. 

EXAMPLE 3 

Animal Studies with p-Galactosidase Conjugates 

25 

For conjugate half-life determination and biodistribution analysis, we injected either 200 ng of SMCC-p-galactos- 
idase (control) or tat1-72-P-galactosidase intravenously ("IV") Into the tail veins of Balb/c mice (Jackson Laboratories), 
with and without chloroquine. We collected blood samples at intervals up to 30 minutes. After 30 minutes, we sacrificed 
the animals and removed organs and tissues for histochemical analysis. 
30 We measured p-galactosidase activity in blood samples by the ONPG assay. The ONPG assay procedure Is de- 

scribed in detail at pages 16.66-16.67 of Sambrook et al. (supra ), p-galactosidase and tat1-72-p-galactosid,ase were 
rapidly cleared from the bloodstream. We estimated their half-lives at 3-6 minutes. These experimental comparisons 
indicated that attachment of the tati -72 transport polypeptide has little or no effect on the clearance rate of p-galac- 
tosidase from the blood. 

35 To detect cellular uptake of the transport polypeptide-p-galactosidase conjugates, we prepared thin frozen tissue 

sections from sacrificed animals (above), carried out fixation as described in Example 2 (above), and subjected them 
to a standard X-gal staining procedure. Liver, spleen and heart stained intensely. Lung, and skeletal muscle stained 
less intensely. Brain, pancreas and kidney showed no detectable staining. High power microscopic examinatk>n re- 
vealed strong cellular, and in some cases, nuclear staining of what appeared to be endothelial cells surrounding the 

40 blood supply to the tissues. 

EXAMPLE 4 

Cellular Uptake Tests with P-Galactosidase-Polyarginine and 3-Galactosidase Polylyslne Conjugates 

45 

To compare the effectiveness of simple basic amino acid polymers with the effectiveness of our tat-derived transport 
polypeptides, we conjugated commercially available polyarginine (Sigma Chem Co., St. Louis, MO, cat. no. P-4663) 
and polylysine (Signna cat. no. P-2658) to p-galactosidase, as described in Example 2, above. We added the conjugates 
to the medium of HeLa cells at 1 -30 |ig/ml, with and without chloroquine. Following incubation with the conjugates, we 

50 fixed, stained and microscopically examined the cells as described in Example 2, above. 

The polylysine-p-galactosidase conjugate gave low levels of surface staining and no nuclear staining. The pol- 
yarginine-p-galactosidase conjugate gave intense overall staining, but showed less nuclear stain than the tati-72-p- 
galactoskJase and tat37-72-p-galactosldase conjugates. To distinguish between cell surface binding and actual inter- 
nalization of the polyarginine-p-galactosidase conjugate, we treated the cells with trypsin, a protease, prior to the fixing 

55 and staining procedures. Trypsin treatment eliminated most of the X-gal staining of polyarginine-p-galactosidase treat- 
ed cells, indicating that the polyarginine-p-galactosidase conjugate was bound to the outside surfaces of the cells rather 
than actually internalized In contrast, cells exposed to the tati -72 or 37-72-p-galactosidase conjugates stained despite 
trypsin treatment, indicating that the p-galactosidase cargo was inside the cells and thus protected from trypsin diges- 
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tion. Control cells treated with unconjugated p-galactosidase showed no detectable staining. 
EXAMPLE 5 

5 Horseradish Peroxidase Conjugates 
Chemical Cross-Linking 

To produce tat1-72-HRP and tat37-72-HRP conjugates, we used a commercially-available HRP coupling kit (Im- 
10 munopure maleimide activated HRR Pierce Chem. Co., cat. no. 31498G). The HRP supplied in the kit is In a form that 
is selectively reactive toward free -SH groups. (Cysteine is the only one of the 20 protein amino acids having a free 
-SH group.) In a transport polyp eptide-HRP conjugation experiment involving tati -72, we produced the tati -72 starting 
material in E coli and purified it by HPLC, as described in Example 1, above. We lyophillzed 200 ng of the purified 
tati -72 (which was dissolved in TFA/acetonitrile) and redissolved it in 100 |il of 100 mM HEPES buffer (pH7.5), 0.5 
IS mM EDTA. We added 50 pJ of the tati -72 or tat37-72 solution to 50 ^1 of Immunppure HRP (750 ^g of the enzyme) in 
250 mM triethanolamine (pH 8.2). We allowed the reaction to proceed for 80 minutes, at room temperature. Under 
these conditions, approximately 70% of the HRP was chemically linked to tati -72 molecules. We monitored the extent 
of the linking reaction by SDS-PAGE analysis. 

20 Cellular Uptake of HRP Conjugates 

We added the conjugates to the medium of HeLa cells at 20 fig/ml, in the presence or absence of 100 chloro- 
quine. We incubated the cells for 4-18 hours at 37'*C/5.5% CO^. We developed the HRP stain using 4-chloro-1 -naphthol 
(Bio-Rad, Richmond, CA, caL no. 170-6431) and hydrogen peroxide HRP substrate. In subsequent experiments, we 
25 substituted diaminobenzidine (Sigma Chem. Co., St. Louis, MO) for 4-chloro-1 -naphthol. 

Cells to which we added transport polypeptide-HRP conjugates displayed cell-associated HRP activity. Short time 
periods of conjugate exposure resulted in staining patterns which appeared punctate, probably reflecting HRP in en- 
docytic vesicles. Following longer incubations, we observed diffuse nuclear and cytoplasmb staining. Control cells 
treated with unconjugated HRP showed no detectable staining. 

30 

EXAMPLE 6 

PE ADP Ribosvlation Domain Conjugates 

35 We cloned and expressed in E coli the Pseudomonas exotoxin ("PE") both in its full length form and in the form 

of its ADP ribosylation domain. We produced transport polypeptide-PE conjugates both by genetic fusion and chemical 
cross-linking 

Plasmid Construction 

40 

To construct plasmid pTat70(Apal), we inserted a unique Apal site into the tat open reading frame by digesting 
pTat72 with BamHI and EcoR1 , and inserting a double-stranded linker consisting of the following synthetic oligonu- 
cleotides: , 

45 

GATCCCAGAC CCACCAGGTT TCTCTGTCGG GCCCTTAAG 
ID WO:S) 

AATTCTTAAG GGCCCGACAG AGAAACCTGG TGGGTCTGG 

50 

ID NO:9) * 

The linker replaced the C-terminus of tat, LysGtnStop, with GlyProStop. The linker also added a unique Apal site 
suitable for in-frame fusion of the tat sequence with the PE ADP ribosylatbn domain-encoding sequences, by means 
55 of the naturally-occurring Apal site in the PE sequence. To construct plasmid pTat70PE (SEQ ID NO: 10), we removed 
an Apal-EcoRl fragment encoding the PE ADP ribosylation domain, from plasmid CD4(1 81 )-PE(392). The construction 
of CD4(T81)-PE(392) is described by G. Winkler et al. (°CD4-Pseudomonas-Exotoxin Hybrid Proteins: Modulation of 
Potency and Therapeutic Window Through Structural Design and Characterization of Cell Internalization", AIDS Re- 



(SEQ 
(SEQ 
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search and Human Retroviruses. 7. pp. 393-401 (1991)). We inserted the Apal-EcoRI fragment into pTat70(Apal) 
digested with Apal and EcoR1 . 

To construct plasmid pTatSPE (SEQ ID NO: 11), we removed a 214 base pair Ndel-Apal fragment from pTat70PE 
and replaced it with a double-stranded linker having Ndel and Apal cohesive termini, encoding tat residues 1 -4 and 
s 67-70, and consisting of the following synthetic oligonucleotides: 

TATGGAACCG GTCGTTTCTC TGTCGGGCC (SEQ ID NO: 1:2) 
CGACAGAGAA ACGACCGGTT CCA (SEQ ID NO: 13). 

10 

Purification of TAT8-PE 

Expression of the pTatS-PE construct yielded the PE ADP ribosylation domain polypeptide fused to amino acids 
IS 1-4 and 67-70 of tat protein. The pTatS-PE expression product ("tatS-PE') sen/ed as the PE ADP ribosylation domain 
moiety (and the unconjugated control) in chemical cross-linking experiments described below. Codons for the 8 tat 
amino acids were artifacts from a cloning procedure selected for convenience. The 8 tat amino acids fused to the PE 
ADP ribosylation domain had no transport activity (Figure 2). 

For purification of tatB-PE, we suspended 4.5 g of pTatB-PE-transformed E.coli in 20 ml of 50 mM Tris-HCI (pH 
20 8.0), 2mM EDTA. We lysed the cells in a French press and removed insoluble debris by centrifugation at 10,000 rpm 
for 1 hour, in an SA600 rotor. Most of the tat8-PE was in the supernatant. We loaded the supernatant onto a 3 ml Q- 
Sepharose Fast Flow (Pharmacia LKB, Piscataway, NJ) ion exchange column. After toading the sample, we washed 
the column with 50 mM Tris-HCI (pH 8 0), 2 mM EDTA. After washing the column, we carried out step gradient elution, 
using the same buffer with 100, 200 and 400 mM NaCI. The tatS-PE eluted with 200 mM NaCI. Following the ion 
25 exchange chromatography, we further purified the tat8-PE by gel filtration on a Superdex 75 FPLC column (Pharmacia 
LKB, Piscataway, NJ). We equilibrated the gel filtration column with 50 mM HEPES (pH 7.5). We then loaded the 
sample and carried out elutton with the equilibration buffer at 0.34 ml/min. We collected 1 .5-minute fractions and stored 
the tat8-PE fractions at-70'*C. 

30 Crosslinkinq of TATB-PE 

Since the PE ADP ribosylation domain has no cysteine residues, we used sulfo-SMCC (Pierce Chem. Co., Rock- 
ford, IL cat no. 22322 G) for transport polypeptide-tat8-PE conjugation. We carried out the conjugation in a 2-step 
reaction procedure. In the first reaction step, we treated tatS-PE (3 mg/ml), in 50 mM HEPES (pH 7.5), with 10 mM 
- 35 sulfo-SMCC, at room temperature, for 40 minutes. (The sulfo-SMCC was added to the reaction as a 100 mM stock 
solution in 1 M HEPES, pH 7.5.) We separated the tat8-PE-sulfo-SMCC from the unreacted sulfo-SMCC by gel filtration 
on a P6DG column (Bio-Rad, Richmond, CA) equilibrated with 25 mM HEPES (pH 6.0), 25 mM NaCL In the second 
reaction step, we allowed the tatS-PE -sulfo-SMCC (1.5 mg/ml 100 mM HEPES (pH 7.5), 1 mM EDTA) to react with 
purified tat37-72 (600 ^M final cone.) at room temperature, for 1 hour. To stop the cross-linking reaction, we added 
40 cysteine. We analyzed the cross-linking reaction products by SDS-PAGE. About 90% of the tatS-PE became cross- 
linked to the tat37-72 transport polypeptide under these conditions. Approximately half of the conjugated product had 
one transport polypeptide moiety, and hatf had two transport polypeptide moieties. 

Cell-Free Assay for PE ADP Ribosylation 

4S 

To verify that the PE ribosylation domain retained its biological activity (i.e., destructive ribosome modification) 
following conjugation to transport polypeptides, we tested the effect of transport polypeptkJe-PE ADP ribosylation con- 
jugates on in vitro (i.e., cell-free) translation. For each in vitro translatbn experiment, we made up a fresh translation 
cocktail and kept it on ice. The in vitro translation cocktail contained 200 \i\ rabbit reticulocyte lysate (Promega, Madison, 

50 Wl), 2 |i! 10 mM ZnCl2 (optional), 4 ^il of a mixture of the 20 protein amino acids except methionine, and 20 m^I ^^S- 
methionine. To9 |il of translation cocktail we added from 1 to 1 000 ng of transport polypeptide-PE conjugate (preferably 
in a volume of 1 jil) or control, and pre-incubated the mixture for 60 minutes at 30°C. We then added 0.5 [i\ BMV RNA 
to each sample and incubated for an additional 60 minutes at 30*C. We stored the samples at -70'*C after adding 5 ^il 
of 50% glycerol per sample. We analyzed the in vitro translation reaction products by SDS-PAGE techniques. We 

55 loaded 2 ^it of each translation reaction mixture (plus an appropriate volume of SDS-PAGE sample buffer) per lane on 
the SDS gels. After electrophoresis, we visualized the ^^S-containing in vitro translation products by fluorography. 

Using the procedure described in the preceding paragraph, we found that the PE ADP ribosylation domain genet- 
ically fused to the tat1-70 transport polypeptide had no biological activity, i.e., did not inhibit in vitro translation. In 
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contrast, using the same procedure, we found that the PE ADP ribosylation domain chemically cross-linked to the 
tat37-72 transport polypeptide had retained full biological activity, i.e., inhibited in vitro translation as well as the non- 
conjugated PE ADP ribosylation domain controls (Figure 2). 

5 Cytotoxicity Assay for PE ADP Ribosylation 

In a further test involving the tat37-72-PE ADP ribosylation domain conjugate, we added it to cultured HeLa cells 
in the presence or absence of 1 00 |iM chloroquine. We then assayed cytotoxicity by measuring In vivo protein synthesis, 
as indicated by trichloroacetic acid ("TCA')-precipitable radioactivity in cell extracts. 

10 We performed the cytotoxicity assay as follows. We disrupted HeLa cell layers, centrifuged the cells and resus- 

pended them at a density of 2.5 x 10^/ml of medium. We used 0.5 ml of suspensionAwell when using 24 well plates, or 
0.25 ml of suspensionAwell when using 48 well plates. We added conjugates or unconjugated controls, dissolved in 
100 ^1 of PBS, to the wells after allowing the cells to settle for at least 4 hours. We incubated the cells in the presence 
of conjugates or controls for 60 minutes, at 37^*0, then added 0.5 ml of fresh medium to each cell, and incubated the 

IS cells for an additional 5-24 hours. Following this incubation, we removed the medium from each well and washed the 
cells once wrth about 0.5 ml PBS. We then added 1 jiCi of 35s-methionine (Amersham) per 100 |il per weil In vivo cell 
labelling grade SJ.1015), and incubated the cells for 2 hours. After two hours, we removed the radioactive medium 
and washed the cells 3 times with cold 5% TCA and then once with PBS. We added 100 \x\ of 0.5 M NaOH to each 
well and allowed at least 45 minutes for cell lysis and protein dissolving to take place. We then added 50 ^iM M HCI 

20 to each well and transferred the entire contents of each well into scintillatbn fluid for liquid scintillation measurement 
of radioactivity. 

In the absence of chloroquine, there was a clear dose^ependent inhibition of cellular protein synthesis in response 
to treatment with the transport polypeptide-PE ADP ribosylation domain conjugate, but not in response to treatment 
with the unconjugated PE ADP ribosylation domain. The resutts are summarized in Figure 2. When conjugated to 
25 tat37-72, the PE ADP ribosylation domain appeared to be transported 3 to 1 0-fold more efficiently than when conjugated 
to tati -72. We also conjugated transport polypeptides tat38-58GGC, tat37-58, tat47-58GGC and tatCGG-47-58 to the 
PE ADP ribosylation domain. All of these conjugates resulted in cellular uptake of biologically active PE ADP ribosylation 
domain (data not shown). 

30 EXAMPLE 7 

Ribonuclease ConiuQates 
Chemical Cross Linking 

35 

We dissolved 7.2 mg of bovine pancreatic ribonuclease A, Type 12A (Sigma Chem. Co., St. Louis, MO. cat. no, 
R5500) in 200 |il PBS (pH 7.5). To the ribonuclease solution, we added 1 .4 mg sulfo-SMCC (Pierce Chem. Co.. Fkx;k- 
ford, IL, cat. no. 22322H). After vortex mixing, we allowed the reaction to proceed at room temperature for 1 hour. We 
removed unreacted SMCC from the ribonuclease-SMCC by passing the reaction mixture over a 9 ml P6DG column 

40 (Bio-Rad, Richmond, CA) and collecting 0.5 ml fractions. We identified the void volume peak fractions (containing the 
ritx)nuclease-SMCC conjugate) by monitoring UV absorbance at 280 nm. We divided the pooled ribonuclease-SMCC- 
containing fractions Into 5 equal aliquots. To each of 4 ribonuclease-SMCC aliquots, we added a chemically-synthesized 
transport polypeptide corresponding to tat residues: 37-72 (■37-72"); 38-58 plus GGC at the carboxy terminal 
(•38-58GGC"); 37-58 (■CGG37-58"); or 47-58 plus CGG at the amino terminal ("CGG47-58"). We allowed the transport 

45 polypeptide- ribonuclease conjugation reactions to proceed for 2 hours at room temperature, and then overnight at 4''C. 
We analyzed the reaction products by SDS-PAGE on a 10-20% gradient gel. The cross-linking efficiency was approx- 
imately 60% for transport polypeptides tat38-58GGC, tat37-58 and tatCGG47-58, and 40% for tal37-72. Of the modified 
species, 72% contained one, and 25% contained 2 transport polypeptide substitutions. 

so Cellular Uptake of Tat37-72-Ribonuclease Conjugates 

We maintained cells at 37°C in a tissue culture incubator in Dulbecco's Modified Eagle Medium supplemented 
with 10% donor calf serum and penlcillium/streptomycin. For cellular uptake assays, we plated 10^ cells in a 24-well 
plate and cultured them overnight. We washed the cells with Dulbecco's PBS and added the ribonuclease conjugate 
SS dissolved in 300 |il of PBS containing 80 \iM chloroquine, at concentrations of 0, 10. 20, 40 and 80 jig/ml. After a 1 .25 
hour incubation at 37'*C, we added 750 |il of growth medium and further incubated the cell samples overnight. After 
the overnight incubation, we washed the cells once with PBS and incubated them for 1 hour in Minimal Essential 
Medium without methionine (Flow Labs) (250 |xl/well) containing ^S methionine (1 |xCi/well). After the 1 hour incubation 
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with radioactive methionine, we removed the medium and washed the cells three times 5% TCA (1 ml/well/wash). We 
then added 250 ^l of 0.5 M NaOH per well. Atter 1 hour at room temperature, we pipetted 200 ^il of the contents of 
each well into a scintillation vial, added 100 ^il of 1 M HCI and 4 ml of scintillation fluid. After thorough mixing of the 
contents of each vial, we measured radioactivity In each sample by liquid scintillation counting. 

5 The cellular uptake results are summarized in Figure 3. Transport polypeptide tat38-58GGC functioned as well 

as. or slightly better than tat37-72. Transport polypeptide tatCGG47-58 had reduced activity (data not shown) We do 
not know whether this polypeptide had reduced uptake activity or whether the proximity of the basic region to the 
rlbonuctease interfered with enzyme activity. 

We have used cation exchange chromatography (BioCAD perfusion chromatography system, PerSeptive Biosys- 

10 tems) to purify ribonuclease conjugates having one or two transport polypeptide moieties. 

EXAMPLE a 

Protein Kinase A Inhibitor Conlugates 

IS 

Chemical Cross-Linking 

We purchased the protein kinase A inhibitor ("PKAI") peptide (20 amino acids) from Bachem California (Torrence, 
CA). For chemical cross-linking of PKAI to transport polypeptides, we used either sulfp-MBS (at 10 mM) or sulfo-SMPB 

20 (at 15 mM). Both of these cross-linking reagents are heterobifunctlonal for throl groups and primary amine groups. 
Since PKAI lacks lysine and cysteine residues, both sulfo-MBS and sulfo-SMPB selectively target cross-linking to the 
amino terminus of PKAI. We reacted PKAI at a concentration of 2 mg/ml, in the presence of 50 mM HEPES (pH 7.5), 
25 mM NaCI, at room temperature, for 50 minutes, with either cross-linking reagent. The sutfo-MBS reaction mixture 
contained 10 mM sulfo-MBS and 20% DMF The sulfo-SMPB reaction mixture contained 15 mM sulfo-SMPB and 20% 

25 dimethylsutfoxkJe ("DMSO"). We purified the PKAI -cross-linker adducts by reverse phase HPLC, using a C4 column. 
We eluted the samples from the C4 column in a 20-75% acetonitrile gradient containing 0.1% trrfluoroacetic acid. We 
removed the acetonitrile and trifluoroacetic acid from the adducts by tyophilization and redissolved them in 25 mM 
HEPES (pH 6.0); 25 mM NaCI. We added tat1-72 or tat37-72 and adjusted the pH of the reaction mixture to 7.5. by 
adding 1 M HEPES (pH 7.5) to 100 mM. We then allowed the cross-linking reaction to proceed at room temperature 

30 for 60 minutes. 

We regulated the extent of cross-linking by altering the transport polypeptide: PKAI ratio. We analyzed the cross- 
linking reaction products by SDS-PAGE. With tat37-72, a single new electrophoretic band fomned in the cross-linking 
reactions. This result was consistent with the addition of a single tat37-72 nnolecule to a single PKAI molecule. With 
tat1-72, six new products formed In the cross-linking reactions. This result is consistent with the addition of multiple 
3S PKAI molecules per tati -72 polypeptide, as a result of the multiple cysteine residues in tati -72. When we added PKAI 
to the cross-linking reaction in large molar excess, we obtained only conjugates containing 5 or 6 PKAI moieties per 
tati -72. 

In Vitro Phosphorylation Assay for PKAI Activity 

40 

To test the sulfo-MBS-cross-linked conjugates for retenlbn of PKAI biological activity, we used an hrvvitro phos- 
phorylation assay. In this assay, histone V served as the substrate for phosphorylation by protein kinase A in the 
presence or absence of PKAI (or a PKAI conjugate) We then used SDS-PAGE to monitor PKAI -dependent differences 
in the extent of phosphorylation. In each reaction, we incubated 5 units of the catalytic subunit of protein kinase A 
45 Sigma) with varying amounts of PKAI or PKAI conjugate, at 37°C, for 30 minutes. The assay reaction mixture contained 
24 mM sodium acetate (pH 6.0), 25 mM MgClg. 100 mM DTT. 50 ^iCi of [y-^P]ATP and 2 fig of histone V, in a total 
reactbn volume of 40 |il. Using this assay, we found that PKAI conjugated to tati -72 or lat37-72 inhibited phosphor- 
ylatton as well as unconjugated PKAI (data not shown). 

50 Cellular Assay 

To test for cellular uptake of PKAI and transport polypeptlde-PKAl conjugates, we employed cultured cells con- 
taining a chloramphenicol acetyltransf erase ("CAT") reporter gene under the control of a cAMP-responsive expression 
control sequence. We thus quantified protein kinase A activity indirectly, by measuring CAT activity. This assay has 
55 been described in detail by J. R. Grove et al. ("Probind cAMP-Related Gene Expression with a Recombinant Protein 
Kinase Inhibitor". Molecular Aspects of Cellular Regulation. Vol. 6 . P Cohen and J. G. Folkes, eds., Elsevier Scientific. 
Amsterdam, pp. 173-95 (1991)). 

Using this assay, we found no activity by PKAI or any of the transport polypeptkJe-PKAl conjugates. This result 
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suggested to us that the PKAI moiety might be undergoing rapid degradatbn upon entry into the cells. 

Cross-Linking of PKAi to Tat37-72-B-Galactosidase 

5 We had previously found cellular uptake of tat37-72-p-galactosidase to be chloroqulne-independent (Example 2. 

above) Therefore, we cross-linked PKAI to tat37-72-p-galactosidase for possible protection of PKAI against rapid 
degradation. 

We treated p-galactosidase with 20 mM DTT (a reducing agent) at room temperature for 30 minutes and then 
removed the DTT by gel filtration on a G50 column in MES buffer (pH 5). We allowed the reduced p-galactosidase to 

10 react with SMPB-activated PKAI (above), at pH 6.5, for 60 minutes. To btock residual free sulfhydryl groups, we added 
N-ethylmaleimide or iodoacetamide. SDS-PAGE analysis showed that at least 95% of the P galactosidase had been 
conjugated. About 90% of the conjugated beta-galactosidase product contained one PKAI moiety per subunit, and 
about 10% contained 2 PKAI moieties. We treated the PKAI-p-galactosidase conjugate with a 10-fold molar excess of 
sulfo-SMCC. We then reacted the PKAI-p-galactosidase-SMCC with tat1-72. According to SDS-PAGE analysis, the 

IS PKAI -P galactosidase: tati -72 ratio appeared to be 1 :0.5- We have produced about 1 00 ^ig of the final product Because 
of precipttatkxi problems, the concentration of the final product in soiutk)n has been limited to 100 \ig/ml 

EXAMPLE 9 

20 E2 Repressor Conjugates 

To test cellular uptake and E2 repressor activity of transport polypeptide-E2 repressor conjugates, we simultane- 
ously transfected an E2-dependent reporter plasmid and an E2 expression plasm id into 8 V40 -transformed African 
green monkey kidney ("COS7") cells. Then we exposed the transfected cells to transport polypeptide-E2 repressor 
25 conjugates (made by genetk; fusion or chemical cross-linking) or to appropriate controls. The repression assay, de- 
scribed below, was essentially as described in Barsoum et al. (supra ). 

Repression Assay Cells 

30 We obtained the COS7 cells from the American Type Culture Collection, Rockville, MD (ATCC No. CRL 1651). 

We propagated the COS7 cells in Dulbecco's modified Eagle's medium (GIBCO. Grand Island, NY) with 10% fetal 
bovine serum (JRH Biosciences, Lenexa, KS) and 4 mM glutamine ("growth medium"). Cell incubation conditions were 
5.5% CO2 at 37*»C. 

35 Repression Assay Plasm ids 

Our E2-dependent reporter plasmid, pXB332hGH, contained a human growth hormone reporter gene driven by a 
truncated SV40 early promoter having 3 upstream E2 binding sites. We constructed the hGH reporter plasmid, 
pXB332hGH, as described in Barsoum et al. (supra ). 

40 For expression of a full-length HPV E2 gene, we constructed plasmid pAHE2 (Figure 4). Plasmid pAHE2 contains 

the E2 gene from HPV strain 16, operatively linked to the adenovirus major late promoter augmented by the SV40 
enhancer, upstream of the promoter. We isolated the HPV E2 gene from plasmid pHPV16 (the full-length HPV16 
genome cloned into pBR322), described in M. Durst el al., "A Papillomavirus DNA from Cervical Carcinoma and Its 
Prevalence in Cancer Biopsy Samples from Different Geographic Regions', Proc. Natl. Acad. Sci. USA. 80, pp. 3812-1 5 

45 (1983), as a Tthllll-Asel fragment. Tthllll cleaves at nucleotide 2711 , and Asel cleaves at nucleotide 3929 in the 
HPV1 6 genome. We blunted the ends of the Tthi 1 1 l-Asel fragment In a DNA polymerase I Klenow reaction, and ligated 
BamHI linkers (New England Biolabs. cat. no. 1021). We inserted this linker-bearing fragment into BamHI-cleaved 
plasmid pBG331 , to create plasmid pAHE2. 

Plasmid pBG331 is the same as pBG312 (R.L. Cate et al, "Isolation of the Bovine and Human Genes for Mullerian 

50 Inhibiting Substance and Expression of the Human Gene in Animal Cells', Cell, 45, pp. 685 98 (1986)) except that it 
lacks the BamHI site downstream of the SV40 polyadenylatton signal, making the BamHI site between the promoter 
and the SV40 intron unique. We removed the unwanted BamHI site by partial BamHI digestion of pBG312, gel purifi- 
cation of the linearized plasmid, blunt end formation by DNA polymerase I Klenow treatment, self-ligation and screening 
for plasmids with the desired deletk)n of the BamHI site. 

55 

Bacterial Production of E2 Repressor Proteins 

One of our E2 repressor proteins, E2.123, consisted of the carboxy-terminal 121 amino ackJs of HPV16 E2 with 
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MetVal added at the amino terminus. We also used a variant of E2.123, called E2 123CCSS. E2.123 has cysteine 
residues at HPV1 6 E2 amino acid positions 251 , 281 , 300 and 309. In E2.1 23CCSS, the cysteine residues at positions 
300 and 309 were changed to serine, and the lysine residue at position 299 was changed to arginine. We replaced the 
cysteine residues at positions 300 and 309, so that cysterne-dependent chemical cross-linking could take place in the 
5 amino terminal portion of the E2 repressor, but not in the E2 minimal DN A binding/dimerlzation domain. We considered 
crosslinks in the minimal DNA binding domain likely to interfere with the repressor's biological activity 

For construction of plasmid pET8c-123 (Figure 5; SEQ ID NO: 14), we produced the necessary DNA fragment by 
standard polymerase chain reaction ("PGR") techniques, with plasmid pHPV16 as the template. (For a general discus- 
sion of PGR techniques, see Chapter 14 of Sambrook et al., supra . Automated PGR equipment and chemicals are 



10 commercially available.) The nucleotide sequence of EA52, the PGR oligonucleotide primer for the 5' end of the 374 
base pair E2-123 fragment, is set forth in the Sequence Listing under SEQ ID NO:15. The nucleotide sequence of 
EA54. the PGR oligonucleotide primer used for the 3' end of the E2-123 fragment is set forth in the Sequence Listing 
under SEQ ID NO: 16. We digested the PGR products with Ncol and BamHI and cloned the resulting fragment into 
Ncol/BamHI -digested expression plasmid pETSc (Studier et al., supra ), to create plasmid pET8c-123. 

IS By using the same procedure with a different 5' oligonucleotide PGR primer, we obtained a 260 base pair fragment 

("E2-85") containing a methionine codon and an alanine codon immediately followed by codons for the carboxy-terminal 
83 amino acids of HPV16 E2. The nucleotide sequence of EA57, the PGR 5" primer for producing E2-85, is set forth 
in the Sequence Listing under SEQ ID NO:34. 

To construct plasmid pET8c-1 23CCSS (Figure 6; SEQ ID NO:17), for bacterial production of E2.123CCSS, we 

20 synthesized an 882 bp Pstl-EagI DNA fragment by PGR techniques. The PGR template was pET8c-123. One of the 
PGR primers, called 374. 1 40, encoded all three amino acid changes: 

GG AC AGTGGA GTATAGAATG TAG AATGCTT TTTAAATCTA TATCTTAAAG ATGTTAAAG (SEQ I D NO: 1 8). The other 
PGR primer, 374.18, had the following sequence: GGGTCGGGGG GG ATGGCGGG GATAAT (SEQ ID NO: 19). We 
digested the PGR reaction products with PstI plus EagI and isolated the 882 bp fragment by standard methods. The 
25 final step was production of pET8c-123CCSS in a 3-plece ligation joining a 3424 bp EcoRI-EagI fragment from pET8c- 
123 with the 882 bp PGR fragment and a 674-bp Pstl-EcoRI pET8c-123 fragment, as shown in Figure 6. We verified 
the construction by DNA sequence analysis. For production of E2.123 and E2.1 23GCSS proteins, we expressed plas- 
mlds pET8c-123 and pET8c-123GCSS in E.coli strain BL21(DE3)pLysS, as described by Studier (supra ). 

30 Purification of E2 Repressor Proteins 

We thawed 3.6 grams of frozen, pET8c-123-transformed E.coli cells and suspended them in 35 ml of 25 mM Tris- 
HGI (pH 7.5), 0 5 mM EDTA, 2.5 mM DTT. plus protease inhibitors (1 mM PMSF, 3 mM benzamidine, 50pg/ml pepstatin 
A, 10 jig/ml aprotinin). We lysed the cells by two passages through a French press at 10,000 psi. We centrifuged the 
35 lysate at 12,000 rpm, in an SA600 rotor, for 1 hour. The E2.123 protein was in the supernatant. To the supernatant, 
we added MES buffer (pH 6) up to 25 mM, MES buffer (pH 5) up to 10 mM, and NaCI up to 125 mM. We then applied 
the supernatant to a 2 ml S Sepharose Fast Flow column at 6 ml/hr. After k>adtng, we washed the column with 50 mM 
Tris-HGI (pH 7.5), 1 mM DTT. We then carried out step gradient elution (2 ml/step) with 200, 300, 400, 500, 700 and 
1000 mM NaCI in 50 mM Tris-HGI (pH -7.5), 1 mM DTT. The E2.123 repressor protein eluted in the 500 and 700 mM 
40 NaCI fractions. SDS-PAGE analysis indicated the E2.123 repressor purity exceeded 95%. 

We thawed 3.0 grams of frozen, pET8c-l 23CGSS-transformed E.coli and suspended the cells in 30 ml of the same 
buffer used for pET8c-123-transformed cells (above). Lysis, removal of insoluble cellular debris and addition of MES 
buffer and NaCI was also as described for purification of E2-1 23. The purification procedure for E2. 1 23GGSS diverged 
after addition of the MES buffer and NaCI, because a precipitate formed, with E2.123GGSS, at that point in the proce- 
ss dure. We removed the precipitate by cenlrifugation, and found that it and the supernatant both contained substantial 
E2 repressor activity. Therefore, we subjected both to purification steps. We applied the supernatant to a 2 ml S Sepha- 
rose Fast Flow column (Pharmacia LKB. Piscataway, NJ) at 6 ml/hr. After bading. we washed the column with 50 mM 
Tris-HGI (pH 7.5), 1 mM DTT After washing the column, we carried out step gradient eluWon (2 ml/step), using 300, 
400, 500, 700 and 1000 mM NaCI in 50 mM Tris-HGI (pH 7.5), 1 mM DTT. The E2.123GCSS protein eluted with 700 
SO mM NaCI. SDS-PAGE analysis indicated its purity to exceed 95%. We dissolved the E2.123GGSS precipitate in 7.5 
ml of 25 mM TrIs-HCI (pH 7.5). 125 mM NaCI. 1 mM DTT and 0.5 mM EDTA. We loaded the dissolved material onto 
a 2 ml S Sepharose Fast Flow column and washed the column as described for E2.123 and non-precipitated 
E2.123CCSS. We carried out step gradient elution (2 ml/step), usirig 300, 500, 700 and 1000 mM NaCI. The E2 re- 
pressor eluted in the 500-700 mM NaCI fractions. SDS-PAGE analysis indbated its purity to exceed 98%. Immediately 
55 following purification of the E2.123 and E2.123CCSS proteins, we added glycerol to a final concentration of 15% (v/ 
v), and stored flash-frozen (liquid Ng) aliquots at -70°G. We quantified the purified E2 repressor proteins by UV ab- 
sorbance at 280 nm, using an extinctbn coefficient of 1 .8 at 1 mg/ml. 
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Chemical Cross-Linking 

We performed chemical synthesis of the transport polypeptide consisting of tat amino acids 37-72, as described 
jn Example 1. We dissolved the polypeptide (5 mg/ml) in 10 mM MES buffer (pH 5.0), 50 mM NaCI, 0.5 mM EDTA, 

s (extinction coefficient of 0.2 at 1 ml/ml). To the transport polypeptide solution, we added a bismaleimidohexane ("BMH") 
(Pierce Chemical Co.. Rockford, IL, cat. no. 2231 9G) stock solution (6.25 mg/ml DMF) to a final concentration of 1.25 
mg/ml, and a pH 7.5 HEPES buffer stock solution (1 M) to a final concentration of 100 mM. We allowed the BMH to 
react with the protein for 30 minutes at room temperature. We then separated the protein-BMH from un reacted BMH 
by gel filtration on a G-10 column equilibrated in 10 mM MES (pH 5), 50 mM NaCI. 0.5 mM EDTA. We stored aliquots 

10 of the transport polypeptide-BMH conjugate at -70* C. 

For cross-linking of the transport polypeptide-BMH conjugate to the E2 repressor, we removed the E2 repressor 
protein from its storage buffer. We diluted the E2 repressor protein with three volumes of 25 mM MES (pH 6.0), 0.5 
mM EDTA and batch-loaded it onto S Sepharose Fast Flow (Pharmacia LKB, Piscataway, NJ) at 5 mg protein per ml 
resin. After pouring the slurry of protein-loaded resin into a column, we washed the column with 25 mM MES (pH 6.0), 

IS 0.5 mM EDTA, 250 mM NaCI. We then eluted the bound E2 repressor protein from the column with the same buffer 
containing 800 mM NaCI. We diluted the E2 repressor-containing eluate to 1 mg/ml with 25 mM MES (pH 6.0), 0.5 mM 
EDTA. From trial cross-linking studies performed with each batch of E2 repressor protein and BMH-activated transport 
polypeptide, we determined that treating 1 mg of E2 repressor protein with 0 6 mg of BMH-activated transport polypep- 
tide yields the desired incorporation of 1 transport molecule per E2 repressor homodimer. Typically, we mixed 2 ml of 

20 E2 repressor (1 mg/ml) with 300 ^1 of tat37-72-BMH (4 mg/ml) and 200 nl of 1 M HEPES (pH 7.5). We allowed the 
cross-linking reaction to proceed for 30 minutes at room temperature. We terminated the cross-linking reaction by 
adding 2 mercaptoethanol to a final concentration of 14 mM. We determined the extent of cross-linking by SDS-PAGE 
analysis. We stored aliquots of the tat37-72-E2 repressor conjugate at -70° C. We employed identical procedures to 
chemically cross-link the tat37-72 transport polypeptide to the HPVE2 123 repressor protein and the HPVE2 CCSS 

25 repressor protein. 

Cellular Uptake of E2 Repressor Conjugates 

For our E2 repression assays, we used transient expression of plasmids transfected into COS7 cells. Our E2 
30 repression assay procedure was similar to that described in Barsoum et al. (supra ). We transfected 4x10^ COS7 cells 
(about 50% confluent at the time of han/est) by electroporatbn, in two separate transfeclions ("EPI ' and "EP2"). In 
transfection EPI , we used 20 jig pXB332hGH (reporter plasmid) plus 380 ^ig sonicated salmon sperm carrier DNA 
(Pharmacia LKB. Piscataway, NJ) In transfection EP2. we used 20 ^ig pXB332hGH plus 30 ^ig pAHE2 (E2 transacti- 
vator) and 350 jig salmon sperm carrier DNA. We carried out electroporations with a Bio-Rad Gene Pulser, at 270 
35 volts, 960 ^FD, with a pulse time of about 1 1 msec. Following the electroporations, we seeded the cells in 6-well dishes, 
at 2 X 105 cells per well. Five hours after the electroporations, we aspirated the growth medium, rinsed the cells with 
growth medium and added 1 .5 ml of fresh growth medium to each well. At this time, we added chtoroquine ("CQ") to 
a final concentration of 80 \lM (or a blank solution to controls). Then we added tat37-72 cross^inked E2.1 23 (■TxHE2') 
or tat37-72 cross-linked to E2.123CCSS (■TxHE2CCSS'). The final concentration of these transport polypeptide-cargo 
40 conjugates was 6, 20 or 60 \igfm\ of cell growth medium (Table I). 



TABLE I 



45 



50 



Identificatbn of Samples 


well 


CQ(^M) 


protein (p.g/ml) 


EP1.1 


0 


0 


EPI .2 


80 


0 


EP2.1 


0 


0 


EP2.2 


0 


6 TxHE2 


EP2.3 


0 


20 TxHE2 


EP2.4 


0 


60 TxHE2 


EP2-5 


0 


6 TXHE2CCSS 


EP2.6 


0 


20 TXHE2CCSS 


EP2.7 


0 


60 TxHE2CCSS 


EP2.8 


80 


0 
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TABLE I (cxjntinued) 



Identification of Samples 


well 


CQ(mM) 


protein (M-g/ml) 


EP2.9 


80 


6 TxHE2 


EP2.10 


80 


20 TxHE2 


EP2.11 


80 


60 TxHE2 


EP2.12 


80 


6 TXHE2CCSS 


EP2.13 


80 


20 TXHE2CCSS 


EP2.14 


80 


60 TxHE2CCSS 



After an 18-hour incubation, we removed the medium, rinsed tfie cells with fresh medium, and added 1.5 ml of 
fresh medium containing the same concentrations of chloroquine and transport polypeptide-cargo conjugates as in the 

^5 preceding 18-hour incubation. This medium change was to remove any hGH that may have been present before the 
repressor entered the cells. Twenty-four hours after the medium change, we harvested the cells and performed cell 
counts to check for viability. We then assayed for hGH on undiluted samples of growth medium according to the method 
of Seldon, described in Protocols in Molecular Biology , Green Publishing Associates. New York, pp. 9.7.1-9.7.2 (1987), 
using the Allegro Human Growth Hormone transient gene expression system kit (Nichols Institute, San Juan Capist- 

20 rano, OA). We subtracted the assay background (i.e., assay components with non-conditioned medium added) from 
the hGH cpm, for all samples. We performed separate percentage repression calculations for a given protein treatment, 
according to whether chloroquine was present ('(+)CQ") or absent ("(-jCQ") in the protein uptake test. We calculated 
percentage repression according to the following formula: 

. (ACT-BKG)-(REP-BKG) ^ 
Repressions ^ ACT-BKG ""^^^ 

where: 

30 BKG = hGH cpm in the transfections of reporter alone (e.g., EP1 .1 for (-)CQ and EP1 .2 for (+)CQ); 

ACT = hGH cpm in the transfection of reporter plus transactivator, but to which no repressor conjugate was added 
(e.g., EP2.1 for (-)CQ and EP2.8 for (+)CQ) ; 

REP = hGH cpm in the transfection of reporter plus transactivator, to which a repressor conjugate was added (e. 
g., EP2.2-2.7 for (-)GQ and EP2.9-2. 1 4 for (+)CQ). 

35 

Data from a representative E2 repression assay are shown in Table II. Table I identifies the various samples represented 
in Table II. Figure 7 graphically depicts the results presented in Table II. 



TABLE II 



40 


E2 Repression Assay 


sample 


hGH cpm 


cpm - assay bkgd 


cpm - BKG 


% repression 




EP1.1 


3958 


3808 








EP1.2 


5401 


5251 






45 














EP2.1 


15,161 


15.011 


11,203 






EP22 


12,821 


12,671 


8863 


20.9 


SO 


EP2.3 


10,268 


10,118 


6310 


43.7 




EP2.4 


8496 


8346 


4538 


59.5 




EP2.5 


11.934 


11.784 


7976 


28.8 




EP2.6 


9240 


9090 


5282 


52.9 


55 


EP2.7 


7926 


7776 


3968 


64.6 




EP2.8 


15.120 


14,970 


9719 
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TABLE II (ccMitinued) 



10 



E2 Repression Assay 


sample 


hGH cpm 


cpm - assay bkgd 


cpm - BKG 


% repression 


EP2.9 


12,729 


12.579 


7328 


24.6 


EP2.10 


9590 


9440 


4189 


56.9 


EP2.11 


8440 


8290 


3039 


68,7 


EP2.12 


11,845 


11.695 


6444 


33.7 


EP2.13 


8175 


8025 


2774 


71.5 


EP2.14 


6697 


6547 


1296 


86.7 



Transport polypeptide tat37-72 cross-linked to either E2 repressor (E2.123 or E2.123CCSS) resulted in a dose- 
dependent inhibition of E2-dependent gene expression in the cultured mammalian cells (Table II; Figure 7). We have 
repeated this experiment tour times, with similar results The effect was E2-specific, in that other tat37-72 conjugates 
had no effect on E2 induction of pXB332hGH (data not shown). Also, the tat37-72xHE2 conjugates had no effect on 
the hGH expression level of a reporter in which the expression of the hGH gene was driven by a constitutive promoter 
which did not respond to E2. The E2 repressor with the CCSS mutation repressed to a greater degree than the repressor 
with the wild-type amino acid sequence This was as expected, because cross-linking of the transport polypeptide to 
either of the last two cysteines in the wild-type repressor would likely reduce or eliminate repressor activity. Chloroquine 
was not required for the repression activity. However, chloroquine did enhance repression in all of the tests. These 
results are summarized in Table II and Figure 7. 

25 

EXAMPLE 10 



TATACYS Conjuqates 
Production of TatAcys 

For bacterial production of a transport polypeptide consisting of tat amino acids 1 -21 fused directly to tat amino 
acids 38-72, we constructed expression plasmid pTATAcys (Figure 8; SEQ ID NO:20). To construct plasmid pTATAcys. 
we used conventional PGR techniques, with plasmid pTAT72 as the PGR template One of the oligonucleotide primers 
used for the PGR was 374.18 (SEQ ID NO:1 9), which covers the EagI site upstream of the tat coding sequence (We 
also used oligonucleotide 374.18 in the construction of plasmid pET8c-123CGSS. See Example 9.) The other oligo- 
nucleotide primer for the PGR, 374.28, covers the EagI site within the tat coding sequence and has a deletion of the 
tat DNA sequence encoding amino acids 22-37. The nucleotide sequence of 374.28 is: TTTAGGGCGG TAAG AGATAC 
GTAGGGCTTT GGTGATGAAC GCGGT (SEQ ID NO:21). We digested the PGR products with EagI and isolated the 
resulting 762-base pair fragment We inserted that EagI fragment Into the 4057 base pair vector produced by EagI 
cleavage of pTAT72. We verified the construction by DNA sequence analysis and expressed the tatAcys polypeptide 
by the method of Studier et al (supra ). SDS-PAGE analysis showed the tatAcys polypeptide to have the correct size. 

For purification of tatAcys protein, we thawed 4.5 grams of pTATAcys-transformed E.coli cells, resuspended the 
cells in 35 ml of 20 mM MES (pH 6.2), 0.5 mM EDTA. We lysed the cells by two passes through a French press, at 
10,000 psi. We removed insoluble debris by centrifugatlon at 10,000 rpm In an SA600 rotor, for 1 hour We applied the 
supernatant to a 5 ml S Sepharose Fast Flow column at 15 ml/hr. We washed the column with 50 mM Tris-HGI (pH 
7.5). 0.3 mM DTT We then carried out step gradient elution (2 ml/step) with the same buffer containing 300, 400. 500, 
700 and 950 mM NaGI. The tatAcys protein eluted In the 950 mM NaGI fraction. 

We conjugated a tatAcys transport polypeptide to rhodamlne isothiocyanate and tested it by assaying directly for 
cellular uptake. The results were positive (similar to results in related experiments with tat1-72). 

TATAcvs-249 Genetic Fusion 



For bacterial expression of the tatAcys transport polypeptide genetically fused to the amino terminus of the native 
E2 repressor protein (i.e., the carboxy-termlnal 249 amino acids of BPV-1 E2), we constructed plasmid pTATAcys-249 
as follows. We constructed plasmid pFTE501 (Figure 9) from plasmids pTAT72 (Frankel and Pabo, supra ) and pXB31 4 
(Barsoum et al.. supra ). From plasmid pXB31 4, we isolated the Ncol-Spel DNA fragment encoding the 249 amino acid 
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BPV-1 E2 repressor (Ncol cleaves at nucleotide 296, and Spel cleaves at nucleotide 1118 of pXB314.) We blunted 
the ends of this fragment by DNA polymerase I Klenow treatment and added a commercially available Bglll linker (New 
England Biolabs, cat. no. 1090). We inserted this linker-bearing fragment into BamHI-cleaved (complete digestion) 
plasmid pTAT72. In pTAT72, there is a BamHI cleavage site within the tat coding region, near its 3' end, and a second 
5 BamHI cleavage site slightly downstream of the tat gene. The Bglll linker joined the tat and E2 coding sequences in 
frame to encode a fusion of the first 62 amino acids of tat protein followed by a serine residue and the last 249 amino 
acids of BPV-1 E2 protein. We designated this bacterial expression plasmid pFTE501 (Figure 9). To construct plasmid 
pTATAcys-249 (Figure 1 0; SEQ ID NO:22), we inserted the 762 base pair EagI fragment from plasmid pTAT cys. which 
includes the portion of tat containing the cysteine deletion, into the 4812 base pair EagI fragment of plasmid pFTE501 

10 

Purification of tatAcvs-249 

We thawed 5 g of E.coli expressing tatAcys-249 and suspended the cells in 40 ml of 25 mM Tris HCI (pH 7.5), 25 
mM NaCI. 0.5 mM EDTA, 5 mM DTT, plus protease inhibitors (1 .25 mM PMSF, 3 mM Benzamidine, 50 fig/ml pepstatin 

IS A, 50 |ig/ml aprotinin, 4 jig/ml E64). We lysed the cells by two passages through a French pressure cell at 10,000 psi. 
We removed insoluble debris from the lysate by centrifugatlon at 12,000 rpm in an SA600 rotor, for 1 hour We purified 
the tatAcys-249 from the soluble fraction. The supernatant was loaded onto a 2 ml S Sepharose Fast Flow column 
(Pharmacia LKB, Piscataway. NJ) at a flow rate of 6 ml/h. The column was washed with 25 mM Tris HCI pH (7.5), 25 
mM NaCI, 0.5 mM EDTA. 1 mM DTT and treated with sequential salt steps in the same buffer containing 100. 200, 

20 300, 400, 500, 600. and 800 mM NaCI. We recovered the TatAcys-249 in the 600-800 mM salt f ractkxis. We pooled 
the peak fractions, added glycerol to 15%, and stored aliquots at -70° C. 

Immunofluorescence Assay 

25 To analyze cellular uptake of the tatAcys-E2 repressor fusion protein, we used indirect immunofluorescence tech- 

niques. We seeded HeLa cells onto cover slips in 6-well tissue culture dishes, to 50% confluence. After an overnight 
Incubation, we added the tatAcys-E2 repressor fusion protein (1 \igfm\ final concentratkxi) and chloroquine (0.1 mM 
final concentration). After six hours, we removed the fusion protein/chloroquine-containing growth medium and washed 
the cells twice with PBS. We fixed the washed ceils in 3.5% formaldehyde at room temperature. We permeabilized the 

30 fixed cells with 0.2% Triton X-100/2% bovine serum albumin ("BSA") In PBS containing 1 mM MgClg/O 1 mM CaCl2 
("PBS+") for 5 minutes at room temperature. To block the permeabilized cells, we treated them with PBS containing 
2% BSA. for 1 hour at 4'*C. 

We incubated the cover slips with 20 ^1 of a primary antibody solution in each well, at a 1:100 dilution in PBS+ 
containing 2% BSA, for 1 hour at 4^. The primary antibody was either a rabbit polyclonal antibody to the BPV-1 E2 
35 repressor (generated by injecting the purified carboxy-terminal 85 amino acids of E2), or a rabbit polyclonal antibody 
to tat (generated by injecting the purified amino-terminal 72 amino acids of tat protein). We added a secondary antibody 
at a 1:100 dilutbn in 0.2% Tween-20/2% BSA in PBS+ for 30 minutes at 4°C. 

The secondary antibody was a rhodamine-conjugated goat anti-rabbit IgG (Cappel no. 2212-0081). Foltowing 
incubation of the cells with the secondary antibody, we washed the cells with 0.2% Tween 20/2% BSA in PBS+ and 
40 mounted the cover slips in 90% glycerol, 25 mM sodium phosphate (pH 7.2), 150 mM NaCI. We examined the cells 
with a fluorescent microscope having a rhodamine filter 

Cellular Uptake of TatACvs Fusions 

45 We observed significant cellular uptake of the tatAcys-E2 repressor fusion protein, using either the tat antibody or 

the E2 antibody. In control cells exposed to the unconjugated tat protein, we observed intracellular fluorescence using 
the tat antibody, but not the E2 antibody. In control cells exposed to a mixture of the unconjugated E2 repressor and 
tat protein or tatAcys, we observed fluorescence using the tat antibody, but not the E2 antibody. This verified that tat 
mediates E2 repressor uptake only when linked to the tat protein. As with unconjugated tat protein, we obsen/ed the 

50 tatAcys-E2 repressor fusion protein throughout the cells, but it was concentrated in intracellular vesicles. These results 
show that a tat-derived polypeptide completely lacking cysteine residues can carry a heterologous protein (i.e., transport 
polypeptide-cargo protein genetic fusion) into animal cells. 

In a procedure similar to that described above, we produced a genetic fusion of tatAcys to the C-terminal 123 
amino acids of HPV E2. When added to the growth medium, this fusion polypeptide exhibited repression of E2-de- 

55 . pendent gene expression in C0S7 cells (data not shown). 
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EXAMPLE 11 

Antisense Oligodeoxynucleotkie Conjugates 

5 Using an automated DNA/RNA synthesizer (Applied Biosystems model 394). we synthesized DNA phospho- 

rothionate analogs (4-1 8 nucleotides in length), with each containing a free amino group at the 5' end. The amine group 
was incorporated into the oligonucleotides using commercially modified nucleotides (aminolink 2, Applied Biosystems). 
The oligonucleotides corresponded to sense and antisense strands from regions of human growth hormone and CAT 
messenger RNA. 

10 For each cross-linking reaction, we dissolved 200 ^ig of an oligonucleotide in 1(X) ^il of 25 mM sodium phosphate 

buffer (pH 7.0). We then added 10 ^1 of a 50 mM stock solution of sulfo-SMCC and altowed the reaction to proceed at 
room temperature for 1 hour We removed unreacted sulfo-SMCC by gel filtratbn of the reactbn mixture on a P6DG 
column (Bio-Rad) in 25 mM HEPES (pH 6 0) We dried the oligonucleotide-sulfo-SMCC adduct under a vacuum Re- 
covery of the oligonucleotides in this procedure ranged from 58 to 95%. For reaction with a transport polypeptide, we 

IS redissolved each oligonucleotide-sulfo-SMCC adduct in 50 fil of 0.5 mM EDTA, transferred the solution to a test tube 
containing 50 ^g of lyophilized transport polypeptide, and allowed the reaction to proceed at room temperature for 2 
hours. We analyzed the reaction products by SDS-PAGE, 

EXAMPLE 12 

20 

Antibody Conjugates 
Anti-Tubulin coniugate 1 

25 We obtained commercial mouse IgGI mAb anti-tubulin (Amersham) and purified it from ascites by conventional 

methods, using protein A. We labelled the purified antibody with rhodamine isothiocyanate, at 1.2 moles rhodamine/ 
mole Ab. When we exposed fixed, permeabilized HeLa cells to the labelled antibody, microscopic examination revealed 
brightly stained microtubules. Although the rhodamine labelling was sufficient, we enhanced the antibody signal with 
antimouse FITC. 

30 In a procedure essentially as described in Example 2, (above) we allowed 250 \xg of the antibody to react with a 

10:1 molar excess of sulfo-SMCC. We then added 48 jig of (^^S-labelled) tat1-72. The molar ratio of tall -71 :Ab was 
2.7:1 . According to incorporation of radioactivity, the tati :72 was cross-linked to the antibody in a ratio of 0.6:1 . 

For analysis of uptake of the tat1-72-Ab conjugate, we added the conjugate to medium (10 jig/ml) bathing cells 
grown on coverslips. We obsen/ed a punctate pattern of fluorescence in the cell. The punctate pattern indicated ve- 

35 sicular location of the conjugate, and was therefore inconclusive as to cytoplasmic delivery. 

To demonstrate immunoreactivity of the conjugated antibody we tested its ability to bind tubulin. We coupled 
purified tubulin to cyanogen bromide-activated Sepharose 4B (Sigma Chem. Co. , St. Louis. MO). We applied a samples 
of the radioactive conjugate to the tubulin column (and to a Sepharose 48 control column) and measured the amount 
of bound conjugate. More radioactivity bound to the affinity matrix than to the control column, indicating tubulin binding 

40 activity. 

Anti-Tubulin conjugate 2 

In a separate cross-linking experiment, we obtained an anti-tubulin rat monoclonal antibody lgG2a (Serotec), and 
45 purified It from ascites by conventional procedures, using protein G. We eluted the antibody with Caps buffer (pH 10). 
The purified antibody was positive in a tubulin-binding assay We allowed tati -72 to react with rhodamine isothiocyanate 
at a molar ratio of 1:1. The reaction product exhibited an /^ss^fi^do ^* which indicated a substitution of 
approximately 0.75 mole of dye per mole of tat1 -72. Upon separation of the unreacted dye from the tat1 -72- rhodamine, 
by G-25 gel filtration (Pharmacia LKB, Piscataway, NJ), we recovered only 52 ^g out of 150 ^ig of tati -72 used in the 
so reaction. 

We saved an aliquot of the tat 1-7 2- rhodamine for use (as a control) in cellular uptake experiments, and added the 
rest to 0.4 mg of antibody that had reacted with SMCC (20:1 ). The reaction mixture contained a tat1 -72:Ab ratio of 
approximately 1:1, rather than the intended 5:1 . (In a subsequent experiment, the 5:1 ratio turned out to be unsatis- 
factory, yielding a precipitate.) We allowed the cross-linking reaction to proceed overnight at 4'*C. We then added a 
ss molar excess of cysteine to block the remaining maleimide groups and thus stop the cross-linking reaction. We cen- 
trifuged the reaction mixtures to remove any precipitate present. 

We carried out electrophoresis using a 4-20% polyacrylamide gradient gel to analyze the supernatant under re- 
ducing and non-reducing conditions. We also analyzed the pellets by this procedure. In supernatants from antibody- 



21 



EP 0 656 950 B1 



tat1-72 (without rhodamine) conjugation experinnents, we observed very little material on the 4-20% gel. However, in 
supernatants from the antibody-tat1 -72-rhodamine conjugation experiments, we observed relatively heavy bands 
above the antibody, for the reduced sample. The antibody appeared to be conjugated to the tat1-72 in a ratio of ap- 
proximately 1:1. 

s In cellular uptake experiments carried out with conjugate 2 (procedure as described above for conjugate 1), we 

obtained results similar to those obtained with conjugate 1 . When visualizing the conjugate by rhodamine fluorescence 
or by fluorescein associated with a second antibody, we observed the conjugate in vesicles. 

EXAMPLE 13 

10 

Additional Tat'E2 Conjuoates 

Chemically Cross-Linked Tat-E2 Conjugates 

15 We chemically cross-linked transport polypeptide tat37-72 to four different repressor forms of E2. The four E2 

repressor moieties used in these experiments were the carboxy-terminal 103 residues (i.e., 308-410) of BPV-1 
("E2.103'); the carboxy-terminal 249 residues (i.e., 162-410) of BPV-1 (■E2.249"); the carboxy-terminal 121 residues 
(i.e., 245-365) of HPV-16 (■HE2"); and the carboxy-terminal 121 residues of HPV-16, in which the cysteine residues 
at positions 300 and 309 were changed to serine, and the lysine residue at position 299 was changed to arginine 

20 ("HE2CCSS"). The recombinant productkxi and purification of HE2 and HE2CCSS, followed by chemical cross-linking 
of HE2 and HE2CCSS to tat37-72, to form TxHE2 and TxHE2CCSS, repectively, are described in Example 9 (above). 
For the chemical cross-linking of E2.103 and E2.249 to tat37-72 (to yield the conjugates designated TxE2.103 and 
TxE2.249), we employed the same method used to make TxHE2 and TxHE2CGSS (Example 9, supra ). 

We expressed the protein E2.103 in E.coli from plasmid pET-E2-103. We obtained pET-E2.103 by a PGR cloning 

25 procedure analogous to that used to produce pET8c-1 23, described in Example 9 (above) and Figure 5. As in the 
construction of pET8c-123. we ligated a PGR-produced Ncol-bamHI E2 fragment into Ncol -Bam HI -cleaved pETBc. 
Our PGR template for the E2 fragment was plasmid pGO-E2 (Hawley-Nelson et al.. EMBOJ.. vol 7, pp. 525-31 (1 988); 
United States patent 5,219,990). The oligonucleotide primers used to produce the E2 fragment from pCO-E2 were 
EA21 (SEQ ID NO:36) and EA22 (SEQ ID NO: 37). Primer EA21 introduced an Ncol site that added a methionine 

30 codon folbwed by an alanine codon 5' adjacent to the coding region for the carboxy-terminal 1 01 residues of BPV-1 E2. 

We expressed the protein E2.249 in E.coli from plasmid pET8c-249. We constructed pET8c-249 by inserting the 
1362 bp Ncol-BamHI fragment of plasmid pXB314 (Figure 9) into Ncol-BamHI-cleaved pET8c (Figure 5). 

TATAcvs-BPV E2 Genetic Fusions 
35 - 

I n addition to TATAcys-249. we tested several other TATAcys-BPV-1 E2 repressor fusions. Plasmid pTATAcys-1 05 
encoded tat residues 1-21 and 38-67, followed by the carboxy-terminal 105 residues of BPV-1. Plasmid pTATAcys^ 
161 encoded tat residues 1-21 and 38-62, followed by the carboxy-terminal 161 residues of BPV-1. We constructed 
plamids pTATAcys-105 and pTATAcys-161 from intermediate plasmids pFTE103 and pFTE403, respectively 

40 We produced pFTEl 03 and pFTE403 (as well as pFTESOl ) by llgating different inserts into BamHI-cleaved (com- 

plete digestion) vector pTAT72. 

To obtain the insertion fragment for pFTEl 03, we isolated a 929 base pair Plel-BamHI fragment from pXB31 4 and 
ligated it to a double-stranded linker consisting of synthetic oligonucleotide FTE.3 (SEQ ID NO:23) and synthetic oli- 
gonucleotide FTE.4 (SEQ ID NO:24). The linker encoded tat residues 61 -67 and had a BanriHI overhang at the 5' end 

45 and a Plel overhang at the 3' end We ligated the linker-bearing fragment from pXB3314 into BamHI-cleaved pTAT72. 
to obtain pFTElOa To obtain the insertion fragment for pFTE403, we digested pXB314 with Ncol and Spel, generated 
blunt ends with Klenow treatment and ligated a Bgll I linker consisting of G AAG ATCTTC (New England Biolabs, Beverly, 
MA, Gat. No. 1090) (SEQ ID NO:35) duplexed with itself. We purified the resulting 822-base pair fragment by eletro- 
phoresis and then ligated it into BamHI-digested pTAT72 vector, to obtain pFTE403. 

50 To delete tat residues 22-37, thereby obtaining plasmid pTATAcys-105 from pFTE103 and pTATAcys-161 from 

pFTE403, we employed the same method (described above) used to obtain plasmid pTATAcys-249 from pFTE501 . 

TATAGVS-HPV E2 Genetic Fusions 

55 We constructed plasmids pTATAcys-HE2.85 and pTATAcys-HE2.121 to encode a fusion protein consisting of the 

tatAcys transport moiety (tat residues 1-21, 38-72) followed by the carboxy-terminal 85 or 121 residues of HPV-16, 
respectively. 

Our starting plasmids in the construction of pTATAcys-HE2.85and pTATAcys-HE2.121 were, respectively, pETSc- 
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85 and pET8c-123 (both described above). We digested pET8c-85 and pET8c-123 with Bglll and Ncol, and isolated 
the large tragment in each case (4769 base pairs from pET8c-85 or 4880 base pairs from pET8c-123) for use as a 
vector. In both vectors, the E2 coding regions begin at the Ncol site. Into both vectors, we inserted the 220 bp Bglll- 
Aatll fragment from plasmid pTATAcys, and a synthetic fragment. The 5* end of the Bglll-Aatll fragment is upstream 
5 of the T7 promoter and encodes the first 40 residues of tatAcys (i.e., residues 1-21, 38-56). The synthetic fragment 
consisting of annealed oligonucleotides 374.67 (SEQ ID NO:25) and 374.68 (SEQ ID NO:26), encoded tat residues 
57-72, with an Aatll overhand at the 5' end and an Ncol overtiand at the 3' end. 



JB Series of Genetic Fusions 



10 



Plasmid pJB106 encodes a fusion protein (Figure 12) (SEQ ID NO:3B) in which an amino-terminal methionine 
residue is followed by tat residues 47-58 and then HPV-16 E2 residues 245-365. To obtain pJB106, we carried out a 
three-way ligation, schematically depicted in Figure 11. We generated a 4602 base pair vector fragment by digesting 
plasmid pETSc with Ncol and BamHL One insert was a 359 base pair Mspl-BamHI fragment from pET8c-1 23, encoding 

IS HPV-16 E2 residues 248-365. The other insert was a synthetic fragment consisting of the annealed oligonucleotide 
pair, 374.185 (SEQ ID NO:27) and 374.186 (SEQ ID NO:28). The synthetic fragment encoded the amino-tenninal 
methionine and tat residues 47-58, plus HPV16 residues 245-247 (i.e., ProAspThr). The synthetic fragment had an 
Ncol overhang at the 5' end and an Mspl overhang at the 3' end. 

We obtained plasmids pJB117 (SEQ ID NO:59). pJB118 (SEQ ID NO:60). pJB119 (SEQ ID NO:61). pJB120 (SEQ 

20 ID NO:62) and pJB122 (SEQ ID NO:63) by PGR deletion cloning in a manner similar to that used for pTATAcys (de- 
scribed above and in Figure 8). We constructed plasmids pJB117 and pJB118 by deleting segments of pTATAcys- 
HE2.121. We constructed plasmids pJB119 and pJB120 by deleting segments of pTATAcys-161 . In all four clonings, 
we used PGR primer 374.122 (SEQ ID NO:29) to cover the Hindlli site downstream of the tat-E2 coding region. In 
each case, the other primer spanned the Ndel site at the start of the tatAcys coding sequence, and deleted codons for 

25 residues at the beginning of tatAcys (i.e., residues 2-21 and 38-46 for pJB117 and pJB119; and residues 2-21 for 
pJB118andpJB120). For deletion of residues 2-21 , we used primer 379.11 (SEQ ID NO:30). For deletion of residues 
2-21 and 38-46. we used primer 379.1 2 (SEQ ID NO:31 ). Following the PGR reaction, we digested the PGR products 
with Ndel and Hindlli. We then cloned the resulting restriction fragments into vector pTATAcys-HE2.121, which had 
been previously digested with Ndel plus Hindlli to yield a 4057 base pair receptor fragment. Thus, we constructed 

30 expression plasmids encoding fusion proteins consisting of amino acid residues as follows: 

JB1 1 7 = Tat47-72-HPV1 6 E2 245-365; 

JB1 1 8 = Tat38-72-HPVl 6 E2 245-365; 
JB119 = Tat47^2-BPV1 E2 250-41 0; and 
3S JB120 = Tat38-62-BPV1 E2 250-410. 

We constructed pJBI 22. encoding tat residues 38-58 followed by HPV1 6 E2 residues 245-365 (i.e:. the E2 carboxy- 
terminal 121 amino acids), by deleting from pJB1 18 codons for tat residues 59-72. We carried out this deletion by PGR. 
using primer 374.1 3 (SEQ ID NO:32), which covers the Aatll site within the tat coding region, and primer 374.14 (SEQ 
40 ID NO:33), which covers the Aatll site slightly downstream of the unique Hindlli site downstream of the Tat-E2 gene. 
We digested the PGR product with Aatll and isolated the resulting restriction fragment. In the final pJB122 construction 
step, we inserted the isolated Aatll fragment into Aatll -digested vector pJBIIS. 

It should be noted that in all five of our pJB constructs described above, the tat coding sequence was preceded 
by a methionine codon for initiation of translation. 

45 

Purification of Tat-E2 Fusion Proteins 

In all cases, we used E.coli to express our tat-E2 genetic fusions. Our general procedure for tat-E2 protein purifi- 
cation included the following initial steps: pelleting the cells; resuspending them in 8-10 volumes of lysis buffer (25 mM 
50 Tris (pH 7.5), 25 mM NaCI, 1 mM DTT, 0.5 mM EDTA) containing protease inhibitors - generally, 1 mM PMSF, 4 jig/ 
ml E64, 50 [ig/ml aprotinin, 50 p.g/ml pepstatin A, and 3 mM benzamidine); lysing the cells in a French press (2 passes 
at 12,000 psi); and centrifuging the lysates at 10,000-12,000 x g for 1 hour (except FTE proteins), at 4* G. Additional 
steps employed in purifying particular tat-E2 fusion proteins are described below. 

E2.103 and E2.249 ~ Foltowing centrifugation of the lysate, we loaded the supernatant onto a Fast S Sepharose 
55 column and eluted the E2.103 or E2.249 protein with 1 M NaCI. We then further purified the E2.103 or E2.249 by 
chromatography on a P60 gel filtration column equilibrated with 1 00 mM HEPES (pH 7.5), 0. 1 mM EDTA and 1 mM DTT. 

FTE 103 - Following centrifugation of the lysate at 10,000 x g for 10 min. at 4° G, we recovered the FTE103 protein 
(which precipitated) by resuspending the pellet in 6 M urea and adding solid guanidine-HGI to a final concentration of 
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7 M. After centrifuging the suspension, we purified the FTE103 protein from the supernatant by chromatography on 
an A.5M gel filtration column in 6 M guanidine, 50 mM sodium phosphate (pH 5.4), 10 mM DTT We collected the 
FTE103-containing fractions from the gel filtration column according to the appearance of a band having an apparent 
molecular weight of 1 9 kDa on Coomassie-stained SDS polyacrylamide electrophoresis gels. 

s FTE403 - Our purification procedure for FTE403 was essentially the same as that for FTE1 03, except that FTE403 

migrated on the gel filtration column with an apparent molecular weight of 25 kDa. 

FTE501 " Following centrifugation of the lysate at 10.000 x g, for 30 minutes, we resuspended the pellet in 6 M 
urea, added solid guanidine-HCI to a final concentration of 6 M, and DTT to a concentration of 10 mM. After 30 minutes 
at 37**C, we clarified the solution by centritugatbn at 10,000 x g for 30 minutes. We then loaded the sample onto an 

10 A.5 agarose gel filtration column in 6 M guanidine-HCI, 50 mM sodium phosphate (pH 5.4), 10 mM DTT and collected 
the FTE501 -containing fractions from the gel filtration column, according to the appearance of a band having an ap- 
parent molecular weight of 40 kDa on Coomassie-stained SDS polyacrylamide electrophoresis gels. We loaded the 
gel filtration-purified FTE501 onto a C^q reverse phase HPLC column and eluted with a gradient of 0-75% acetonitrile 
in 0.1% trifluoroacetic acid. We collected the FTE501 protein in a single peak with an apparent molecular weight of 40 

15 kDa. 

TatAcvs-105 - Following centrifugation of the lysate, we loaded the supernatant onto a Q^epharose column 
equilibrated with 25 mM Tris (pH 7.5), 0.5 mM EDTA We loaded the Q-Sepharose column flow-through onto an S- 
Sepharose column equilibrated with 25 mM MES (pH 6 0), after adjusting the Q-Sepharose column flow-through to 
about pH 6.0 by adding MES (pH 6.0) to a final concentration of 30 mM. We recovered the tatAcys-105 protein from 

20 the S-Sepharose column by application of sequential NaCI concentration steps in 25 mM MES (pH 6.0). TatAcys-105 
eluted in the pH 6.0 buffer at 800-1000 mM NaCI. 

TatAcvs-161 - Following centrifugation of the lysate, we loaded the supernatant onto an S-Sepharose column 
equilibrated with 25 mM Tris (pH 7.5). 0.5 mM EDTA. We recovered the tatAcys-161 from the S-Sepharose column by 
applicafion of a NaCI step gradient in 25 mM Tris (pH 7.5). TatAcys-161 eluted in the pH 7.5 buffer at 500-700 mM NaCL 

2S TatAcvs-249 - Following centrifugation of the lysate. we loaded the supernatant onto a Q-Sepharose column 

equilibrated with 25 mM Tris (pH 7.5), 0.5 mM EDTA. We recovered the tatAcys-249 from the S-Sepharose column by 
application of a NaCI step gradient in 25 mM Tris (pH 7.5). TatAcys-249 eluted in the 600-800 mM portion of the NaCI 
step gradient. 

TatAcvs-HE2 85 and TatAcvs-HE2.121 ~ Following centrifugation of the lysate, we loaded the supernatant onto a 
30 Q-Sepharose column. We loaded the flow through onto an S-Sepharose column. We recovered the tatAcys-HE2.85 
or tatAcys-HE2.121 from the S-Sepharose column by application of a NaCI step gradient. Both proteins eluted with 1 
M NaCI. 

HPV E2 and HPV E2CCSS - See Example 9 (above). 

JB106 ~ Following centrifugation of the lysate, and collection of the supernatant, we added NaCI to 300 mM. We 
35 loaded the supernatant with added NaCI onto an S-Sepharose column equilibrated with 25 mM HEPES (pH 7.5). We 
treated the column with sequential salt concentration steps in 25 mM HEPES (pH 7.5), 1.5 mM EDTA, 1 mM DTT. We 
eluted the JB106 protein from the S-Sepharose column with 1 M NaCI. 

JB117 - Following centrifugation of the lysate, and collection of the supernatant, we added NaCI to 300 mM. Due 
to precipitation of JB117 at 300 mM NaCI. we diluted the JB117 supernatant to 100 mM NaCI and batch-baded the 
40 protein onto the S-Sepharose column. We eluted JB117 from the S-Sepharose column with 1 M NaCI in 25 mM Tris 
(pH 7.5). 0.3 mM DTT. 

JB118 - Folbwing centrifugation of the lysate, and collection of the supernatant, we added NaCt to 300 mM. We 
loaded the supernatant with added NaCI onto an S-Sepharose column equilibrated with 25 mM Tris (pH 7.5). We eluted 
the JB118 protein from the S-Sepharose column with 1 M NaCI in 25 mM Tris (pH 7.5), 0.3 mM DTT. 
45 JB119, JB120, JB121 and JB122 - Following centrifugation of the lysate, and collection of the supernatant, we 

added NaCI to 150 mM for JB119 and JB121. and 200 mM for JB120 and JB122. We loaded the supernatant with 
added NaCI onto an S-Sepharose column equilibrated with 25 mM Tris (pH 7.5). We eluted proteins JB11 9, JB120. 
JB121 and JB122 from the S-Sepharose column with 1 M NaCI In 25 mM Tris (pH 7.5), 0.3 mM DTT 

50 EXAMPLE 14 

E2 Repression Assays - Additional Conjugates 

We tested our tat-E2 fusion proteins for inhibition of transcriptional activation by the full-length papillomavirus E2 

55 protein ("repression") We measured E2 repression with a transient co-transfection assay in COS7 cells. The COS7 
cells used in this assay were maintained in culture for only short periods of time. We thawed the COS7 cells at passage 
1 3 and used them only through passage 25. Long periods of propagation led to low levels of E2 transcriptional activation 
and decreased repression and reproducibility. Our repression assay and method of computing repression activity are 
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described in Example 9 (above). For the conjugates TxE2. 1 03, TxE22.249, FTE103, FTE202, FTE403and FTE501, 
we substituted the BPV-1 E2 transactivator, in equal amount, for the HPV-16 E2 transactivator. Accordingly, instead 
of transf ecting with the HPV 1 6 E2 expression plasmid pAHE2, we transfected with the BPV-1 E2 expression plasmid 
pXB323. which is fully described in United States patent 5.219.990. 
s The genetic fusion protein JB106 has consistently been our most potent tat-E2 repressor conjugate. Data from a 

repression assay comparing JB106 and TxHE2CCSS are shown in lable III. Figure 13 graphically depicts the results 
presented in Table III. 

In addition to JB106, several other tat-E2 repressor conjugates have yielded significant repression. As shown in 
Table IV, TxHE2, TxHE2CCSS, JB117, JB118. JB119, JB120 and JB122 displayed repression levels in the ++ range. 



TABLE III 





Protein added (fig/ml) 


cpm-bkgd* 


average of duplicates 


average cpm-bkgd 


% repression 




0 




3,872 










0 




3.694 


3783 








0 




17,896 










0 




18,891 


18,393 


14.610 




20 


1 


JB106 


16,384 










1 


JB106 


17.249 


16.816 


13.033 


10.8 




3 


JB106 


11,456 










3 


JB106 


10.550 


11.003 


7,220 


50.6 


2S 


10 


JB1 06 


6,170 










10 


JB106 


7.006 


6.588 


2.805 


81.0 




30 


JB106 


4.733 










30 


JB106 


4,504 


4.618 


835 


94.3 


30 


1 


TXHE2CCSS 


17.478 










1 


TXHE2CCSS 


18,047 


17,762 


13.979 


4.3 




3 


TXHE2CCSS 


14.687 










3 


TxHE2CCSS 


15,643 


15.165 


11.382 


22.1 


35 


10 


TXHE2CCSS 


12.914 








10 


TxHE2CCSS 


12.669 


12,791 


9,008 


38.3 




30 


TXHE2CCSS 


7.956 










30 


TxHE2CCSS 


8.558 


8,257 


4.474 


69.4 


40 


1 


HE2.123 


18.290 










1 


HE2.123 


18.744 


18,517 


14,734 


0 




3 


HE2.123 


17.666 










3 


HE2.123 


18.976 


18.321 


14.538 


1.3 


45 


10 


HE2.123 


18.413 








10 


HE2.123 


17,862 


18.137 


14,354 


2.6 




30 


HE2.123 


18,255 










30 


HE2.123 


18,680 


18,467 


14,684 


0.3 



*Bkgd=158cpm. 



Table IV summarizes our tat-E2 repressor assay results. Although we tested all of our tat-E2 repressor conjugates 
in similar assays, the conjugates were not all simultaneously tested in the same assay. Accordingly, we have expressed 
the level of repression activity, semi-quantitatively, as +++, ++, +, +/-, or with +++ being strong repression, and - 
being no detectable repression Figure 13 illustrates the repression activity rating system used in Table IV. JB106 
55 exemplifies the +++ activity level. TxHE2CCSS exemplifies the ++ activity level. The negative control, HE2.123, ex- 
emplifies the - activity level. The + activity level is intermediate between the activity observed with TxHE2CCSS and 
HE2.123. The two conjugates whose activity is shown as +/- had weak (but detectable) activity in some assays and 
no detectable activity in other assays. 
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TABLE IV 



s 



10 



25 



Protsin 


Tat residues 


E2 residues 


Repression Level 


TxE2 103 


37-72 


BPV-1 308-410 


+ 


TXE2.249 


37-72 


BPV-1 162-410 


- 


TxHE2 


37-72 


HPV-16 245-365 


++ 


TXHE2CCSS 


37-72 


HP V-1 6 245-365 


++ 


FTE103 


1-67 


BPV-1 306-410 


- 


FTE208 


1-62 


BPV-1 311^10 


- 


FTE403 


1-62 


BPV-1 250-410 


- 


FTE501 


1-62 


BPV-1 162-410 


- 


TatAcys-105 


1-21,38-67 


BPV-1 306-410 


- 


TatAcys-161 


1-21.38-62 


BPV-1 250-410 


+/- 


TatAcys-249 


. 1-21,38-62 


BPV-1 162-410 


+/- 


TatAcys-HE2.85 


1-21,38-72 


HPV-16 281-365 


+ 


TatAcys-HE2.121 


1-21.38-72 


HPV-16 245-365 


+ 


JB106 


47758 


HPV-16 245-365 


+++ 


JB117 


47-72 


HPV-16 245-365 


++ 


JB118 


38-72 


HPV-16 245-365 


++ 


JB119 


47-62 


BPV-1 250-410 


++ 


JB120 


38-62 


BPV^I 250-410 


++ 


JB122 


38-58 


HPV-16 245-365 


++ 



FIE 103, FTE403. FTE208 and FTE501, the four conjugates having the tat amino-terminal region (i.e,, residues 
1 -21 ) and the cysteine-rich region (i.e., residues 22-37) were completely defective for repression. Since we have shown, 
by indirect immunofluorescence, that FTE501 enters cells, we consider it likely that the E2 repressor activity has been 
lost in the FTE series as a result of the linkage to the tat transport polypeptide. Our data show that the absence of the 
cysteine-rich region of the tat moiety generally increased E2 repressor activity In addition, the absence of the cysteine- 
rich region in tat-E2 conjugates appeared to increase protein production levels In E coli . and increase protein solubility, 
without loss of transport into target cells. Deletion of the amino-terminal region of tat also increased E2 repressor 
activity. Fusion protein JB106, with only tat residues 47-58, was the most potent of our tat-E2 repressor conjugates. 
However, absence of the tat cysteine-rich region does not always result in preservation of E2 repressor activity in the 
conjugate. For example, the chemical conjugate TxE2.249 was insoluble and toxic to cells. Thus, linkage of even a 
cysteine-f ree portk>n of tat may lead to a non-functional E2 repressor conjugate. 



EXAMPLE 15 
Cleavable E2 Conjugates 

Chemical conjugation of tat moieties to E2 protein resulted in at least a 20-fold reduction in binding of the E2 protein 
to E2 binding sites on DNA (data not shown). Therefore, we conducted experiments to evaluate cleavable cross-linking 
between the tat transport moiety and the E2 repressor moiety. We tested various cleavable cross-linking methods. 

In one series of experiments, we activated the cysteine sulfhydryl groups of HPV E2-CCSS protein with aldrithiol 
in 100 mM HEPES (pH 7.5), 500 mM NaCI. We isolated the activated E2 repressor by gel filtration chromatography 
and treated it with tat37-72. We achieved low cross-linking efficiency because of rapid E2-CCSS dimer formation upon 
treatment with aldrithk>l. To avoid this problem, we put the E2-CCSS into 8 M urea, at room temperature, and treated 
it with aldrithiol at 23'*C for 60 minutes under denaturing conditions. We then refolded the E2CCSS-aldrithiol adduct, 
isolated it by gel filtration chromatography and then allowed it to react with tat37-72. This procedure resulted in excellent 
cross-linking. We also cross-linked E2CSSS and E2CCSC to tat37-72, using a modification of the urea method, wherein 
we used 5-Sepharose chromatography instead of gel filtration to isolate the E2-aldrithiol adducts. This modification 
increased recovery of the adducts and resulted in cross-linkage of approximately 90% of the E2 starting material used 
in the reaction. 

The cleavable tat-E2 conjugates exhibited activity in the repression assay. However, the repression activity of the 
cleavable conjugates was slightly lower than that of similar conjugates cross-linked irreversibly. The slightly lower 
activity of the cleavable conjugates may be a reflection of protein half-life in the cells. Tat is relatively stable in cells 
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E2 proteins generally have short half-lives in cells. Thus, Irreversible cross-linkage between a tat moiety and an E2 
moiety may stabilize the E2 moiety. 

EXAMPLE 16 

5 

Herpes Simplex Virus Repressor Conjugate 

Herpes simplex virus ("HSV") encodes a transcriptional activator, VP16, which induces expression of the immediate 
early HSV genes. Friedman et al. have produced an HSV VP16 repressor by deleting the carboxy-terminal transacti- 
10 vation domain of VP16 ("Expression of a Truncated Viral Trans-Activator Selectively Impedes Lytic Infection by Its 
Cognate Virus", Nature. 335, pp. 452-54 (1988)). We have produced an HSV-2 VP16 repressor in a similar manner 
To test cellular uptake and VP16 repressor activity of transport polypeptide-VP1 6 repressor conjugates, we simul- 
taneously transfecled a VP16-dependent reporter plasmid and a VP16 repressor plasmid into COS7 cells. Then we 
exposed the transfected cells to a transport polypeptide-VP16 repressor conjugate or to an appropriate control. The 
15 repression assay, described below, was analogous to the E2 repression assay described above, in Example 9 

VP16 Repression Assay Plasmids 

Our reporter construct for the VP16 repression assay was plasmid pl75kCAT, obtained from G. Hayward (see, P. 

20 O'Hare and G.S. Hayward, "Expressbn of Recombinant Genes Containing Herpes Simplex Virus Delayed-Early and 
Immediate-Early Regulatory Regions and Trans Activation by Herpes Virus Infection*. J. Virol. . 52, pp. 522-31 (1984)). 
Plasmid p175kCAT contains the HSV-1 I El 75 promoter driving a CAT reporter gene. 

Our HSV-2 transactivator construct for the VP1 6 repression assay was plasmid pXB324, which contained the wild- 
type HSV-2 VP 16 gene under the control of the chicken p-actin promoter. We constructed pXB324 by inserting into 

25 pXBlOO (P Han et al., "Transactivation of Heterologous Promoters by HIV-1 Tat". Nuc. Acids Res. , 19, pp. 7225-29 
(1 991 )), between the Xhol site and BamHI site, a 280 base pair fragment containing the chicken p-actin promoter and 
a 2318 base pair BamHI-EcoRI fragment from plasmid pCA5 (O'Hare and Hayward. supra ) encoding the entire wild 
type HSV-2 VP1 6 protein. 

30 Tat-VP16 Repressor Fusion Protein 

We produced in bacteria fusion protein tat-VP16R.GF (SEQ ID NO: 58), consisting of amino acids 47-58 of HIV tat 
protein followed by amino acids 43-412 of HSV VP16 protein. For bacterial production of a tat-VP16 repressor fusion 
protein, we constructed plasmid pET/tat-VP16R.GF, in a three-piece ligation. The first fragment was the vector pET- 

35 3d (described above under the alternate designatiion "pET-8c") digested with Ncol and Bgll I (approximately 4600 base 
pairs). The second fragment consisted of synthetic oligonucleotides 374.219 (SEQ ID N0.39) and 374.220 (SEQ ID 
NO:40), annealed to form a double-stranded DNA molecule. The 5' end of the synthetic fragment had an Ncol overhang 
containing an ATG translation start codon. Following the start codon were codons for tat residues 47-58. immediately 
foltowing the tat codons, in frame, were codons for VP 16 residues 43-47. The 3' terminus of the synthetic fragment 

40 was a blunt end for ligation to the third fragment, ah 11 34 base pair Pvull-Bglll fragment from pXB324R4, containing 
codons 48-412 of HSV-2 VP16. We derived pXB324R4 from pXB324 (described above). Plasmid pXB324R2 was an 
intermediate in the construction of pXB324R4. 

We constructed pXB324R2 by inserting into pXBlOO a 1342 base pair BamHI-Aatll fragment, from pXB324, en- 
coding the N-terminal 419 amino acids of HSV-2 VP16. To provide an in-frame stop codon, we used a 73 base pair 

45 Aatll-EcoRI fragment from pSV2-CAT (CM. Gorman et al.. Molecular & Cellular Biology. 2. pp. 1044-51 (1982)). Thus, 
pXB324R2 encoded the first 41 9 amino acids of HSV-2 VP1 6 and an additional seven non-VPI 6 amino acids preceding 
the stop codon. To construct pXB324R4, we carried out a 3-piece ligation involving a 5145 base pair Mlul-EcoRI frag- 
ment from pXB324R2, and two insert fragments. One insert was a 115 base pair Mlul-Nspl fragment from pXB324R2, 
encoding the first 198 residues of VP16. The second insert fragment was a double-stranded synthetic DNA molecule 

50 consisting of the synthetic oligonucleotides 374.32 (SEQ ID NO:41) and 374.33 (SEQ ID NO:42). When annealed, 
these oligonucleotides formed a 5* Nspl sticky end and a 3' EcoRl sticky end. This synthetic fragment encoded VP16 
residues 399-412, followed by a terminatbn codon. Thus, plasmki pXB324R4 differed from pXB324R2 by lacking 
codons for VP16 amino acids 413-419 and the seven extraneous amino acids preceding the slop codon. 

55 Purification of tat"VP16R.GF Fusion Protein 

We expressed our genetic construct for tat-VP16R.GF in E.coli. We harvested the transfonmed E.coli by centrifu- 
gation; resuspended the cells In 8-10 volumes of lysis buffer (25 mM Tris (pH 7,5), 25 mM NaCI, 1 mM DTT, 0.5 mM 
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EDTA, 1 mM PMSF, 4 ng/ml E64, 50 fig/ml aprotinin, 50 jig/ml pepslatin A, and 3 mM benzamidine) ; lysed the cells 
in a French press (2 passes at 12,000 psi); and centrifuged the lysate at 10.000 to 12,000 x g for 1 hour, at 4^*0. 
Following centrifugation of the lysate, we loaded the supernatant onto a Fast Q-Sepharose column equilibrated with 
25 mM Tris (pH 7.5), 0.5 mM EDTA. We loaded the Q-Sepharose flow-through onto a Fast S-Sepharose column equil- 
5 ibrated in 25 mM MES (pH 6.0), 0.1 mM EDTA, 2 mM DTT We recovered the tat-VP16 fusion protein from the S- 
Sepharose column with sequential NaCI concentration steps in 25 mM MES (pH 6.0), 0.1 mM EDTA, 2 mM DTT The 
tat-VP16 fusion protein eluted in the 600-1000 mM NaCI fractions. 

VP 16 Repression Assay 

We seeded HeLa cells in 24-well culture plates at 10^ cells/well. The following day, we transfected the cells, using 
the DEAE-dextran method, as described by B.R. Cullen, "Use of Eukaryotic Expressioon Technology in the Functional 
Analysis of Cloned Genes", Meth Enzymol. , vol. 1 52, p. 684 (1 987). We precipitated the DNA for the transfections and 
redissolved it, at a concentration of approximately 100 ^ig/ml, in 100 mM NaCI, 10 mM Tris (pH 7.5). For each trans- 

15 fection, the DNA-DEAE mix consisted of: 200 ng p175kCAT (+/- 1 ng pXB324) or 200 ng pSV-GAT (control), 1 mg/ml 
DEAE-dextran, and PBS, to a final volume of 100 |al. We exposed the cells to this mixture for 15-20 minutes, at 37"C, 
with occasional rocking of the culture plates. We then added to each well, 1 ml fresh DC medium (DMEM + 1 0% serum) 
with 80 fiM chloroquine. After incubating the cells at 37° C for 2.5 hours, we aspirated the medium from each well and 
replaced it with fresh DC containing 10% DMSO. After 2.5 minutes at room temperature, we aspirated the DMSO 

20 constaining medium and replaced it with fresh DC containing 0, 10 or 50 jig/ml purified tat-VP16.GR The following 
day, we replaced the medium in each well with fresh medium of the same composition. Twenty-four hours later, we 
lysed the HeLa cells with 0.65% NP-40 (detergent) in 10 mM Tris (pH 8.0), 1 mM EDTA, 150 mM NaCI. We measured 
the protein concentration in each extract, for sample normalization in the assay 

At a tat-VPI 6.GF concentration of 50 ^ig/ml, cellular toxicity interfered with the assay. At a concentration of 10 |ig/ 

25 rnl, the tat-VP16.GF fusion protein yielded almost complete repression of VP16-dependent CAT expression, with no 
visible cell death and approximately 30% repression of non-VP16-dependent CAT expression in controls. Thus, we 
observed specific repression of VP1 6-dependent transactivation in addition to a lesser amount non-specific repression. 

EXAMPLE 17 

30 

Transport polypeptide - DNA Coniugates 

Transcriptional activation by a DNA-binding transcription factor can be inhibited by introducting into cells DNA 
having the binding site for that transcription factor. The transcnption factor becomes bound by the introduced DNA and 

35 is rendered unavailable to bind at the promoter site where it normally functions. This strategy has been employed to 
inhibit transcriptional activation of by NF-KB (Bielinska et al., "Regulation of Gene Expressbn with Double-Stranded 
Phosphorothioate Oligonucleotides'. Science, vol. 250, pp. 997-1000 (1990)). Bielinska et al. obsen^ed dose<lepend- 
ent inhibition when the double stranded DNA was put in the cell culture medium. We conjugated the transport polypep- 
tide tat 37-72 to the double stranded DNA molecule to determine whether such conjugation would enhance the inhibition 

40 by increasing the cellular uptake of the DNA. 

We purchased four custom-synthesized 39-mer phosphorothioate oligonucleotides designated NF1, NF2, NF3 
and NF4, having nucleotide sequences (SEQ ID NO:43). (SEQ ID NO:44), (SEQ ID NO:45) and (SEQ ID NO:46). 
respectively. NF1 and NF2 form a duplex corresponding to the wild type NF-kB binding site. NFS and NF4 form a 
duplex corresponding to a mutant NF-icB binding site. 

45 We dissolved NF1 and NF3 in water, at a concentration of approximately 4 mg/ml. We then put 800 \ig of NF1 and 

NF3 separately into 400 p.1 of 50 mM triethanolamine (pH 8.2), 50 mM NaCI, 10 mM Traufs reagent. We allowed the 
reaction to proceed for 50 minutes at room temperature. We stopped the reaction by gel filtration on a P6DG column 
(BioRad, Richmond, CA) equilibrated with 50 mM HEPES (pH 6.0), 50mM NaCI, to remove excess Traut's reagent. 
We monitored 260 nm absorbance to identify the oligonucleotide-containing fractions. Our recovery of the oligonucle- 

50 otides was approximately 75%. We then annealed Traut-modified NF1 with NF2 (0.55 mg/ml final concentration) and 
annealed Traut-modified NF3 with NF4 0.50 mg/ml final concentration). Finally, we allowed 0.4 mg of each Traut- 
modified DNA to react with 0.6 mg of tat37-72-BMH (prepared as described in Example 9, above), in 1 ml of 100 mM 
HEPES (pH 7.5), for 60 minutes at room temperature. We monitored the extent of the cross-linking reaction by poly- 
acrylamide gel electrophoresis followed by ethidium bromide staining of the gel. In general, we observed that about 

55 50% of the DNA was modified under these conditions. 

These double-stranded DNA molecules were tested, essentially according to the methods of Bielinska et al. (supra ), 
with and without tat linkage, for inhibition of NF-icB transcriptional activation. Tat linkage significantly enhanced the 
transactivation by NF-kB. 



28 



EP 0 656 950 B1 



Recombinant DN A sequences prepared by the processes described herein are exemplified by a culture deposited 
In the American Type Culture Collection, Rockville, Maryland. The Escherichia coli culture identified as pJB106 was 
deposited on July 28, 1 993 and assigned ATCC accession number 69368. 

While we have described a number of embodiments of this invention, it is apparent that our basic constructions 
s can be altered to provide other embodiments that utilize the processes and products of this invention. Therefore, it will 
be appreciated that the scope of this invention is to be defined by the appended claims rather than by the specific 
embodiments that have been presented by way of example. 

SEQUENCE LISTING 

10 

(1 ) GENERAL INFORMATION: 

(i) APPLICANT 

IS BIOGEN, INC. 

BARSOUM, James G. (US only) 
FAWELL. Stephen E. (US only) 
PEPINSKY. R. B. (US only) 

20 (il) TITLE OF INVENTION: TAT-DERIVED TRANSPORT POLYPEPTIDES 

(iii) NUMBER OF SEQUENCES: 63 

(iv) CORRESPONDENCE ADDRESS: 

25 

(A) ADDRESSEE: FISH & NEAVE 

(B) STREET: 1251 Avenue of the Americas 

(C) CITY: New York 

(D) STATE: New York 
30 (E) COUNTRY: USA 

(F) ZIP: 10020 

(V) COMPUTER READABLE FORM: 

35 (A) MEDIUM TYPE: Floppy disk . - 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentin Release #1.0. Version #1.25 

40 (vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

45 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 07/934.375 

(B) FILING DATE: 21-AUG-1992 

50 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Haley Jr., James R 

(B) REGISTRATION NUMBER: 27,794 

ss (C) REFERENCE/DOCKET NUMBER: B1 70CIP 

(ix) TELECOMMUNICATION INFORMATION: 
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(A) TELEPHONE: (212) 596-9000 

(B) TELEFAX: (212) 596-9090 

(C) TELEX: 14-8367 

5 (2) INFORMATION FOR SEQ ID NO: 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 86 amino acids 
10 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
15 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: human immunodeficiency virus 

(B) STRAIN: type 1 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1: 



Met Glu Pro Val Asp Pro Arg Leu Glu Pro Trp Lys His Pro Gly Ser 
1 5 10 15 

25 

Gin Pro Lys Thr Ala Cys Thr Asn Cys Tyr Cys Lys Lys Cys Cys Phe 
20 25 30 



30 



35 



His Cys Gin Val Cys Phe lie Thr Lys Ala Leu Gly lie Ser Tyr Gly 
35 40 45 



Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Pro Gin Gly Ser Gin Thr 
50 55 60 



His Gin Val Ser Leu Ser Lys Gin Pro Thr Ser Gin Ser Arg Gly Asp 
65 70 75 80 



Pro Thr Gly Pro Lys Glu 
85 

45 

(2) INFORMATION FOR SEQ ID NO:2: 
(i) SEQUENCE CHARACTERISTICS: 

50 

(A) LENGTH: 36 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

55 (ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 
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Cys Phe He Thr Lys Ala Leu Gly lie Ser Tyr Gly Arg Lys Lys Arg 
15 10 15 

5 

Arg Gin Arg Arg Arg Pro Pro Gin Gly Ser Gin Thr His Gin Val Ser 
20 25 30 



Leu Ser Lys Gin 

35 



(2) INFORMATION FOR SEQ ID NO:3; 

75 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 

2S 

Cys Phe He Thr Lys Ala Leu Gly He Ser Tyr Gly Arg Lys Lys Arg 
15 10 15 

30 

Arg Gin Arg Arg Arg Pro 
20 

(2) INFORMATION FOR SEQ ID NO:4: 

35 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 amino acids 

(B) TYPE: amino acid 
40 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

45 

Phe He Thr Lys Ala Leu Gly He Ser Tyr Gly Arg Lys Lys Arg Arg 
1 5 10 15 

50 

Gin Arg Arg Arg Pro Gly Gly Cys 
20 

55 (2) INFORMATION FOR SEQ ID NO:5: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

s (11) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 

Cys Gly Gly Tyr Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro 
1 5 10 15 

(2) INFORMATION FOR SEQ ID NO:6: 
IS (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1 5 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

20 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 

25 

Tyr Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Giy Gly Cys 
15 10 15 

30 (2) INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 56 amino acids 
35 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 

Met Glu Pro Val Asp Pro Arg Leu Glu Pro Trp Lys His Pro Gly Ser 
15 10 15 

45 

Gin Pro Lys Thr Ala Phe lie Thr Lys Ala Leu Gly lie Ser Tyr Gly 
20 25 30 

SO 

Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Pro Gin Gly Ser Gin Thr 
35 40 45 



55 



His Gin Val Ser Leu Ser Lys Gin 
50 55 
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(2) INFORMATION FOR SEQ ID NO:8: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(il) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 



GATCCCAGAC CCACCAGGTT TCTCTGTCGG GCCCTTAAG 
(2) INFORMATION FOR SEQ ID NO:9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 



AATTCTTAAG GGCCCGACAG AGAAACCTGG TGGGTCTGG 

(2) INFORMATION FOR SEQ ID NO:10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5098 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: 
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TTGAAGACGA AAGGGCCTCG TGATACGCCT ATTTTTATAG GTTAATGTCA TGATAATAAT 60 

CGTTTCTTAG ACGTCAGGTG GCACTTTTCG GGGAAATGTG CGCGGAACCC CTATTTGTTT 120 

5 

ATTTTTCTAA ATACATTCAA ATATGTATCC GCTCATGAGA CAATAACCCT GATAAATGCT 180 

TCAATAATAT TGAAAAAGGA AGAGTATGAG TATTCAACAT TTCCGTGTCG CCCTTATTCC 240 

10 CTTTTTTGCG GCATTTTGCC TTCCTGTTTT TGCTCACCCA GAAACGCTGG TGAAAGTAAA 300 

A6ATGCTGAA GATCAGTTGG GTGCACGAGT GGGTTACATC GAACTGGATC TCAACAGCGG 360 

TAAGATCCTT GAGAGTTTTC GCCCCGAAGA ACGTTTTCCA ATGATGAGCA CTTTTAAAGT 420 

15 

TCTGCTATGT GGCGCGGTAT TATCCCGTGT TGACGCCGGG CAAGAGCAAC TCGGTCGCCG 480 

CATACACTAT TCTCAGAATG ACTTGGTTGA GTACTCACCA GTCACAGAAA AGCATCTTAC 540 

20 

GGATGGCATG ACAGTAAGAG AATTATGCAG TGCTGCCATA ACCATGAGTG ATAACACTGC 600 

GGCCAACTTA CTTCTGACAA CGATCGGAGG ACCGAAGGAG CTAACCGCTT TTTTGCACAA 660 

2s CATGGGGGAT CATGTAACTC GCCTTGATCG TTGGGAACCG GAGCTGAATG AAGCCATACC 720 

30 
35 
40 
45 
50 
55 
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10 



IS 



20 



25 



30 



35 



40 



45 



50 



55 



AAAC6ACGAG CGTGACACCA CGATGCCTGC AGCAATGGCA ACAACGTTGC GCAAACTATT 7 BO 

AACTGGCGAA CTACTTACTC TAGCTTCCCG GCAACAATTA ATAGACTGGA TGGAGGCGGA 840 

TAAAGTTGCA GGACCACTTC TGCGCTCGGC CCTTCCGGCT GGCTCGTTTA TTGCTGATAA 900 

ATCTGGAGCC GGTGAGCGTG GGTCTCGCGG TATCATTGCA GCACTCGGGC CAGATGGTAA 960 

GCCCTCCCGT ATCGTAGTTA TCTACACGAC GGGGAGTCAG GCAACTATGG AT6AACGAAA 1020 

TAGACAGATC GCTGAGATAG GTGCCTCACT CATTAAGCAT TGGTAACTGT CACACCAAGT 1080 

TTACTCATAT ATACTTTAGA TTGATTI ACTTCATTTT TAATTTAAAA GGATCTAGGT 1140 

GAAGATCCTT TTTGATAATC TCATGACCAA AATCCCTTAA CGTGAGTTTT CGTTCCACTG 1200 

AGCGTCAGAC CCCGTAGAAA AGATCAAAGG ATCTTCTTGA GATCCTTTTT TTCTGCGCGT 1260 

AATCTGCTGC TTGCAAACAA AAAAACCACC CCTACCAGCG GTGGTTTGTT TGCCGGATCA 1320 

ACAGCTACCA ACTCTTTTTC CGAAGGTAAC TGGCTTCAGC AGAGCGCAGA TACCAAATAC 1380 

TGTCCTTCTA GTGTAGCCGT AGTTAGGCCA CCACTTCAAG AACTCTGTAG CACCGCCTAC 1440 

ATACCTCGCT CTGCTAATCC TGTTACCACT GGCTGCTGCC AGTGGCGATA AGTCGTGTCT 1500 

TACCGGGTTG GACTCAAGAC GATAG7 JACC GGATAAGGCG CAGCGGTCGG GCTGAACGGG 1560 

GGGTTCGTGC ACACAGCCCA GCTTGGAGCG AACGACCTAC ACCGAACTGA GATACCTACA 1620 

GCGTGAGCAT TGAGAAAGCG CCACGCTTCC CGAAGGGAGA AAGGCGGACA GGTATCCGGT 1680 

AAGCGGCAGG GTCGGAACAG GAGAGCGCAC GAGGGAGCTT CCAGGGGGAA AOGCCTGGTA 1740 

TCTTTATAGT CCTGTCGGGT TTCGCCACCT CTGACTTGAG CGTCGATTTT TGTGATGCTC 1800 

GTCAGGGGGG CGGAGCCTAT 6GAAAAACGC CAGCAACGCG GCCTTTTTAC GCTTCCTGGC 1860 

CTTTTGCTGG CCTTTTGCTC ACATGTTCTT TCCTGCGTTA TCCCCTGATT CTGTGGATAA 1920 

CCGTATTACC GCCTTTGAGT GAGCTGATAC CGCTCGCCGC AGCCGAACGA CCGAGCGCAG 1980 

CCAGTCAGTG AGCGAGGAAG CGGAAGACCG CCTGATGCGG TATTTTCTCC TTACGCATCT 2040 

GTGCGGXATT TCACACCGCA TATATGGTGC ACTCTCAGTA CAATCTGCTC TGATGCCGCA 2100 

TAGTTAAGCC AGTATACACT CCGCTATCGC TACGTGACTG GGTCATGGCT GCGCCCCGAC 2160 

ACCCGCCAAC ACCCGCTGAC GCGCCCTGAC GGGCTTGTCT GCTCCCGGCA TCCGCTTACA 2220 

GACAAGCTGT GACCGTCTCC GGGAGCTGCA TGTGTCAGAG GTTTTCACCG TCATCACCGA 2280 

AACGCGCGAG GCAGCTGCGG TAAAGCTCAT CAGCGTGGTC GTGAAGCGAT TCACAGATGT 2 340 

CTGCCTGTTC ATCCGCGTCC AGCTCGTTGA GTTTCTCCAG AAGCGTTAAT GTCTGGCTTC 2400 
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TGATAAAGCG GGCCRTGTTA AGGGCGGTTT TTTCCTGTTT GGTCACTTGA TGCCTCCGTG 2460 

TAAGGGGGAA TTTCTGTTCA TGGGGGTAAT GATACCGATG AAACGAGAGA GGATGCTCAC 2520 

GATACGGGTT ACTGATCATG AACATGCCCG GTTACTGGAA CGTTGTGAGG GTAAACAACT 2580 

GGCGGTATGG ATGCGGOGGG ACCA6AGAAA AATCACTCAG GGTCAATGCC AGCGCTTCGT 2640 

TAATACAGAT GTAGGTGTTC CACAGGGTAG CCAGCAGCAT CCTGCGATGC AGATCCGGAA 2700 

CATAATGGTG CAGGGCGCTG ACTTCCGCGT TTCCAGACTT TACGAAACAC GGAAACCGAA 2760 

GACCATTCAT GTTGTTGCTC AGGTCGCAGA CGTTTTGCAG CAGCAGTCGC TTCACGTTCG 2820 

CTCGCGTATC GGTGATTCAT TCTGCTAACC AGTAAGGCAA CCCCGCCAGC CTAGCCGGGT 2880 

CCTCAACGAC AGCAGCAC6A TCATGCGCAC CCGTGGCCAG GACCCAACGC TGCCCGAGAT 2940 

GCGCCGCGTG CGGCTGCTGG AGATGGCGGA CGCGATGGAT ATGTTCTGCC AAGGGTTGGT 3000 

TTGCGCATTC ACAGTTCTCC GCAAGAATTG ATTGGCTCCA ATTCTTGGAG TGGTGAATCC 3060 

GTTAGCGAGG TGCCGCCGGC TTCCATTCAG CTCGAGGTGG CCCGGCTCCA TGCACCGCGA 3120 

CGCAACGCGG GGAGGCAGAC AAGGTATAGG GCGGCGCCTA CAATCCATGC CAACCCGTTC 3180 

CATGTGCTCG CCGAGGCGGC ATAAATCGCC GTGACGATCA GCGGTCCAGT GATCGAAGTT 3240 

AGGCTGGTAA GAGCCGCGAG CGATCCTTGA AGCTGTCCCT CATCCTCGTC ATCTACCTGC 3300 

CTGGACAGCA TGCCCTGCAA CGCGGGCATC CCGATGCCGC CGGAAGCGAG AAGAATCATA 3360 

ATGGGGAAGG CCATCCAGCC TCGCGTCGCG AACGCCAGCA AGACGTAGCC CAGCGCGTCG 3420 

^ GCCGCCATGC CGGCGATAAT GGCCTGCTTC TCGCCG/VAAC GTTTGGTGGC GGGACCAGTG 3480 

ACGAAGGCTT GAGCGAGGGC GTGCAAGATT CCGAATACCG CAAGCGACAG GCCGATCATC 3540 

GTCGCGCTCC AGCGAAAGCG GTCCTCGCCG AAAATGACCC AGAGCGCTGC CGGCACCTGT 3600 

40 

CCTAOGAGTT CCATGATAAA GAAGACACTC ATAAGTCCGG CGACGATAGT CATGCCCCGC 3660 

GCCCACCGGA AGGAGCTGAC TGGGTTGAAG GCTCTCAAGG GCATCGGTCG ACGCTCTCCC 3720 

45 TTATGCGACT CCTGCATTAG GAAGCAGCCC AGTAGTAGGT TGAGGCCGTT GAGCACCGCC 3780 

GCCGCAAGGA ATGGTGCATG CAAGGAGATG GCGCCCAACA GTCCCCCGGC CACGGGGCCT 3840 

GCCACCATAC CCACGCCGAA ACAAGCGCTC ATGAGCCCGA AGTGGCGAGC CCGATCTTCC 3900 

SO 

CCATCGGTGA TGTCGGCGAT ATAGGCGCCA GCAACCGCAC CTGTGGCGCC GGTGATGCCG 3960 

GCCACGATGC GTCCGGCGTA GAGGATCGAG ATCTCGATCC CGCGAAATTA ATACGACTCA 4020 

55 CTATAGGGAG ACCACAACGG TTTCCCTCTA GAAATAATTT TGTTTAACTT TAAGAAGGAG 4080 
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ATATACATAT GGAACCGGTC GACCCGCGTC TGGAACCATG GAAACACCCC GGGTCCCAGC 4140 

CGAAAACCGC GTGCACCAAC TGCTACTGCA AAAAATGCTG CTTCCACTGC CAGGTTTGCT 4200 

5 

TCATCACCAA AGCCCTAGGT ATCTCTTACG GCCGTAAAAA ACGTCGTCAG CGACGTCGTC 4260 

CGCCGCAGGG ATCCCAGACC CACCAGGTTT CTCTGTCGGG CCCGGCGGAC AGCGGCCACG 4320 

1Q CCCTGCTGGA GCGCAACTAT CCCACTGGCG CGGAGTTCCT CGGCGACGGC GGCGACGTCA 4380 

GCTTCAGCAC CCGCGGCACG CAGAACTGGA CGGTGGAGCG GCTGCTCCAG GCGCACCGCC 4440 

AACTGGAGGA GCGCGGCTAT GTGTTCGTCG GCTACCACGG CACCTTCCTC GAAGCGGCGC 4500 

IS 

AAAGCATCGT CTTCGGCGGG GTGCGCGCGC GCAGCCACGA CCTCGACCCG ATCTGGCGCG 4560 

GTTTCTATAT CGCCGGCGAT CCGGCGCTGG CCTACGGCTA CGCCCAGGAC CAGGAACCCG 4620 

20 ACGCACGCGG CCGGATCCGC AACGGTCCCC TGCTGCGGGT CTATGTGCCG CGCTCGAGCC 4680 

TGCCGGGCTT CTACCGCACC AGCCTGACCC TGGCCGCGCC GGAGGCGGCG GGCGAGGTCG 4740 

AACGGCTGAT CGGCCATCCG CTGCCGCTGC GCCTGGACGC CATCACCGGC CCCGAGGAGG 4800 

25 

AAGGCGGGCG CCTGGAGACC ATTCTCGGCT GGCCGCTGGC CGAGCGCACC GTGGTGATTC 4860 

CCTCGGCGAT CCCCACCGAC CCGCGCAACG TCGGCGGCGA CCTCGACCCG TCCAGCATCC 4920 

30 CCGACAAGGA ACAGGCGATC AGCGCCCTGC CGGACTACGC CAGCCAGCCC GGCAAACCGC 4980 

CGCGCGAGGA CCTGAAGTAA CTGCCGCGAC CGGCCGGCTC CCTTCGCAGG AGCCGGCCTT 5040 

CTCGGGGCCT GGCCATACAT CAGGTTTTCC TG ATGCCAGC CCAATCGAAT ATGAATTC 5098 

35 

(2) INFORMATION FOR SEQ ID NO:11: 
(i) SEQUENCE CHARACTERISTICS: 

40 (A) LENGTH: 491 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

45 (ii) MOLECULE TYPE: DNA (genomic) 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO:11 : 

5^, TTGAAGACGA AAGGGCCTCG TGATACGCCT ATTTTTATAG GTTAATGTCA TGATAATAAT 60 

GGTTTCTTAG ACGTCAGGTG GCACTTTTCG GGGAAATGTG CG CGGAACCC CTATTTGTTT 120 

ATTTTTCTAA ATACATTCAA ATATGTATCC GCTCATGAGA CAATAACCCT GATAAATGCT 180 

55 

TCAATAATAT TGAAAAAGGA AGAGTATGAG TATTCAACAT TTCCGTGTCG CCCTTATTCC 240 
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CrrTTTTGCG 


GCATTTTGCC 


TTCCTGTTTT 


TGCTCACCCA 


GAAACGCTGG 


TGAAAGTAAA 


300 




AGATGCTGAA 


GATCAGTTGG 


GTGCACGAGT 


GGGTTACATC 


GAACTGGATC 


TCAACAGCGG 


360 


s 


TAAGATCCTT 


GAGAGTTTTC 


GCCCCGAAGA 


ACGTTTTCCA 


ATGATGAGCA 


CTTTTAAAGT 


420 




TCTGCTATGT 


GGGGCGGTAT 


TATCCCGTGT 


TGACGCCGGG 


CAAGAGCAAC 


TCCGTCGCCG 


4d0 


10 


CATACACTAT 


TCTCAGAATG 


ACTTGGTTGA 


GTACTCACCA 


GTCACAGAAA 


AGCATCTTAC 


540 




GGATGGCATG 


ACAGTAAGAG 


AATTATGCAG 


TGCTGCCATA 


ACCATGAGTG 


ATAACACTGC 


600 




GGCCAACTTA 


CTTCTGACAA 


CGATCGGAGG 


ACCGAAGGAG 


CTAACCGCTT 


TTTTGCACAA 


660 


15 


CATGGGGGAT 


CATGTAACTC 


GCCTTGATCG 


TTGGGAACCG 


GAGCTGAATG 


AAGCCATACC 


720 




AAACGACGAG 


CGTGACACCA 


CGATGCCTGC 


AGCAATGGCA 


ACAACGTTGC 


GCAAACTATT 


780 


20 


AACTGGCGAA 


CTACTTACTC 


TAGCTTCCCG 


GCAACAATTA 


ATAGACTGGA 


TGGAGGCGGA 


840 




TAAAGTTGCA 


GGACCACTTC 


TGCGCTCGGC 


CCTTCCGGCT 


GGCTGGTTTA 


TTGCTGATAA 


900 




ATCTGGAGCC 


GGTGAGCGTG 


GGTCTCGCGG 


TATCATTGCA 


GCACTGGGGC 


CAGATGGTAA 


960 


25 


GCCCTCCCGT 


ATCGTAGTTA 


TCTACACGAC 


GGGGAGTCAG 


GCAACTATGG 


ATGAACGAAA 


1020 




TAGACAGATC 


GCTGAGATAG 


GTGCCTCACT 


GATTAAGCAT 


TGGTAACTGT 


CAGACCAAGT 


1080 


30 


TTACTCATAT 


ATACTTTAGA 


TTGATTTAAA 


ACTTCATTTT 


TAATTTAAAA 


GGATCTAGGT 


1140 




GAAGATCCTT 


TTTGATAATC 


TCATGACCAA 


AATCCCTTAA 


CGTGAGTTTT 


CGTTCCACTG 


1200 




AGCGTCAGAC 


CCCGTAGAAA 


AGATCAAAGG 


ATCTTCTTGA 


GATCCTTTTT 


TTCTGCGCGT 


1260 


35 


AATCTGCTGC 


TTGCAAACAA 


AAAAACCACC 


GCTACCAGCG 


GTGGTTTGTT 


TGCCGGATCA 


1320 




AGAGCTACCA 


ACTCTTTTTC 


CGAAGGTAAC 


TGGCTTCAGC 


AGAGCGCAGA 


TACCAAATAC 


1380 


40 


TGTCCTTCTA 


GTGTAGCCGT 


AGTTAGGCCA 


CCACTTCAAG 


AACTCTGTAG 


CACCGCCTAC 


1440 




ATACCTCGCT 


CTGCTAATCC 


TGTTACCAGT 


GGCTGCTGCC 


AGTGGCGATA 


AGTCGTGTCT 






TACCGGGTTG 


GACTCAAGAC 


GATAGTTACC 


GGATAAGGCG 


CAGCGGTCGG 


GCTGAACGGG 


iDOO 


45 


GGGTTCXJTGC 


ACACAGCCCA 


GCTTCGAGCG 


AACGACCTAC 


ACCGAACTGA 


GATACCTACA 






GCGTGAGCAT 


TGAGAAAGCG 


CCACGCTTCC 


CGAAGGGAGA 


AAGGCGGACA 


GGTATCCGGT 




SO 


AAGCGGCAGG 


GTCGGAACAG 


GAGAGCGCAC 


GAGGGAGCTT 


CCAGGGGGAA 


ACGCCTGGTA 


1740 




TCTTTATAGT 


CCTGTCGGGT 


TTCGCCACCT 


CTGACTTGAG 


CGTCGATTTT 


TGTGATGCTC 


1800 




GTCAGGGGGG 


CGGAGCCTAT 


CGAAAAACGC 


CAGCAACGCG 


GCCTTTTTAC 


GGTTCCTGGC 


1860 


SS 


CTTTTGCTGG 


; CCTTTTGCTC 


: ACATGTTCTT 


TCCTGCGTTA 


TCCCCTGATT 


CTGTGGATAA 


1920 
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CCTACGAGTT 


GCATGATAAA 


GAAGACAGTC 


ATAAGTGCGG 


CGACGATAGT 


CATGCCCCGC 


3660 


GCCCACCGGA 


AGGAGCTGAC 


TGGGTTGAAG 


GCTCTCAAGG 


GCATCGGTCG 


ACGCTCTCCC 


3720 


TTATGCGACT 


CCTGCATTAG 


GAAGCAGCCC 


AGTAGTAGGT 


TGAGGCCGTT 


GAGCACCGCC 


3780 


GCCGCAAGGA 


ATGGTGCATG 


CAAGGAGATG 


GCGCCCAACA 


GTCCCCCGGC 


CACGGGGCCT 


3840 


GCCACCATAC 


CCACGCCGAA 


ACAAGCGCTC 


ATGAGCCCGA 


AGTGGCGAGC 


CCGATCTTCC 


3900 


CCATCGGTGA 


TGTCGGCGAT 


ATAGGCGCCA 


GCAACCGCAC 


CTGTGGCGCC 


GGTGATGCCG 


3960 


GCCACGATGC 


GTCCGGCGTA 


GAGGATCGAG 


ATCTCGATCC 


CGCGAAATTA 


ATACGACTCA 


4020 


CtATAGGGAG 


ACCACAACGG 


TTTCCCTCTA 


CAAATAATTT 


TCTTTAACTT 


TAAGAAGGAG 


4080 


ATATATATGG 


AACCGGTCGT 


TTCTCTGTCG 


GGCCCGGCGG 


ACAGCGGCGA 


CGCCCTGCTG 


4140 


GAGCGCAACT 


ATCCCACTGG 


CGCGGAGTTC 


CTCGGCGACC 


GCGGCGACGT 


CAGCTTCAGC 


4200 


ACCCGCGGCA 


CGCAGAACTG 


GACGGTGGAG 


CGGCTGCTCC 


AGGCGCT^CCG 


CCAACTGGAG 


4260 


GAGCGCGGCT 


ATGTGTTCGT 


CGGCTACCAC 


GGCACCTTCC 


TCGAAGCGGC 


GCAAAGCATC 


4320 


GTCTTCGGCG 


GGGTGCGCGC 


GCGCAGCCAG 


GACCTCGACG 


CG RTCTGGCG 


CGGTTTCTAT 


4380 


ATCGCCGGCG 


ATCCGGCGCT 


GGCCTACGGC 


TACGCCCAGG 


ACCAGGAACC 


CGACGCACGC 


4440 


GGCCGGATCC 


GCAACGGTGC 


CCTGCTGCGG 


GTCTATGTGC 


CGCGCTCGAG 


CCTGCCGGGC 


4500 


TTCTACCGCA 


CCAGCCTGAC 


CCTGGCCGCC 


CCGGAGGCGG 


CGGGCGAGGT 


CGAACGGCTG 


4560 


ATCGGCCATC 


CGCTGCCGCT 


GCGCCTGGAC 


GCCATCACCG 


GCCCCGAGGA 


GGAAGGCGGG 


4620 


CGCCTGGAGA 


CCATTCTCGG 


CTGGCCGCTG 


GCCGAGCGCA 


CCGTGGTGAT 


TCCCTCGGCG 


4680 


ATCCCCACCG 


ACCCGCGCAA 


CGTCGGCGGC 


GACCTCGACC 


CGTCCAGCAT 


CCCCGACAAG 


4740 


GAACACGCGA 


TCAGCGCCCT 


GCCGGACTAC 


GCCAGCCAGC 


CCGGCAAACC 


GCCGCGCGAG 


4800 


GACCTGAAGT 


AACTGCCGCG 


ACCGGCCGGC 


TCCCTTCGCA 


GGAGCCGGCC 


TTCTCGGGGC 


4860 


CTGGCCATAC 


ATCAGGTTTT 


CCTGATGCCA 


GCCCAATCGA 


ATATGAATTC 




4910 



(2) INFORMATION FOR SEQ ID NO;12: 
(i) SEQUENCE CHARACTERISTICS: 

so (A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

S5 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: 
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TATGGAACCG GTCGTTTCTC TGTCGGGCC 29 

5 (2) INFORMATION FOR SEQ ID NO:13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 
10 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



IS 



20 



40 



45 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 

CGACAGAGAA ACGACCGGTT CCA 23 



(2) INFORMATION FOR SEQ ID NO: 14: 
(i) SEQUENCE CHARACTERISTICS: 

25 (A) LENGTH: 4977 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

30 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 

35 TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

^ AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 
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SO 



55 



CAACATCGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACX5GGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT iGTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CCGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCCTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTCC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACC GCATATATGG TGCACTCTCA GTACAATCTG CTCTGATGCC 2100 

GCATAGTTAA GCCAGTATAC ACTCCGCTAT CGCTACGTGA CTGGGTCATG GCTGCGCCCC 2160 

GACACCCGCC AACACCCGCT GACGCGCCCT GACGGGCTTG TCTGCTCCCG GCATCCGCTT 2220 

ACAGACAAGC TGTGACCGTC TCCGGGAGCT GCATGTGTCA GAGGTTTTCA CCGTCATCAC 2280 

CGAAACGCGC GAGGCAGCTG CGGTAAAGCT CATCAGCGTG GTCGTGAAGC GATTCACAGA 2340 
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TCTCXCCCTO TTCATCCCCO TCCACCTCGT TCACTTTCTC CACAACCGTT AATGTCTCCC 2400 

TTCTCATAAA GCCCCCCATG TTAAGCGCGG TTTTTTCCTG TTTGCTCACT T0AT6CCTCC 2460 

CTGTAAGGGG GAATTTCTGT TCATCGCGGT AATCATACCX5 AXGAA&CGAG ACACGATCCT 2S20 

CACGATACGC CTTACTGATC AtCAACATCC CCGGTTACTC GAACCnCTG AGGGTAAACA 2580 

ACTG6CGGTA TGGA7GCGCC GGGACCAGAG AAAAATCACT CAGGCTCAAT GCCAGCGCTT 2640 

OCTTAATACA CATCTACGTC TTCCACAGGG TACCCAGCAC CATCC7GCGA TCCAGATCCC 2700 

CAACATAATC GTCCACGGCC CTGACTTCCG CCTTTCCACA CTTTACGAAA CACCCAAACC 2760 

GAAGACCATT CATCTTGTTG CTCAGGTCGC ACACGTTTXG CAGCA6CAG7 CGCTTCACCT 2820 

TCGCrCGCGT ATCCGTCATT CATTCTGCTA ACCAGTAAGG CAACCCCGCC ACCCTA6CCC 2880 

20 GGTCCtCAAC GACAGOAGCA CCATCATGCC CACCCGTGGC CAGCACCCAA CGCTGCCCGA 2940 

GATG0CCC6C GTGCGGCtGC TGGAGATGGC GGACGCGATG CATATCTTCT GCCAAGCeTT 3000 

GGTTTCCCCA TTCACACTTC TCCCCAAGAA TTGATTGGCT CCAATTCTTG GACTGCTGAA 3060 

TCCGTTAGCG ACGTCCCCCC GCCTTCCATT CAGCTCGAGG TCCCCCCCCT CCATGCAOCG 3120 

CGACGCAACG CGGGGAGGCA GACAAGCTAT AGGGOGGCGC CTACAATCCA tCCCAACCCG 3180 

TTCCATGTGC TCGCCCACGC GCCATAAATC CCCCT6ACCA TCACCCGTCC AGTCATCCAA 3240 

GTTAGCCTCG TAA6ACCCCC GAGCGATCCT TGAACCTGTC CCnSATGCTC GTCAXCTACC 3300 

ICCCTGGACA GCATGGCCTG CAACGCGGCC ATCCCGATGC OGCOSGAA6C GAGAACAATC 3360 

ATAATGCGGA AGGCCATCCA GCCTCGCGTC CCCAACCCCA CC3kAGACGTA CCCCAGCGCG 3420 

TCGGCCGCCA TGCCGGCGAT AATCGCCTGC TTCTCGCOGA AAC C TTT C CT GOOGCGACCA 3480 

6TCACGAAGG CTTGAGCGAG GGCG7GCAA6 ATTCCGAATA COGCAAGOGA CA6GCCGATC 3S40 

ATCGTCGCGC TCCAGCGAAA GCGGTCC7CG CCGAAAATGA COCACACCGC T6CCCCCACC 3600 

XGTCCTACCA GTTOCATCAT AAAGAACACA GTCATAA6TC CG6CGACCAT AGTCATGCCC 3660 

CGCGCCCACC GGAACCAGCT CACIGGGTTG AAGGCTCTCA AGCGCATCCC TCCACCCTCT 3720 

CCCTTATGCG ACTCCTGCAT TAGGAACCAG CCCAGTAGTA GGTTGAGCCC GTTCAGCACC 3780 

SO GOCGCCGCAA CGAAT6CTCC ATGCAAGCAG ATCCCGCCdA ACACTCCCCC GGOCACGCCG 3840 

CCTGCCACCA TACCCACGCC CAAACAAGCC CTCATCACCC CCAACTGCCG AGCCCCATCT 3900 

TCCCCATCGC TGATCTt3CCC GATATACCCC CCACCAACCC CACCTGTGGC CCCGCTCATG 3960 

CCCGCCACCA TCCCTCCCGC CTAGACGATC CAGATCTCCA TCCCCCGAAA TTAATACGAC 4020 
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30 



35 
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TCACTATAGG GAGACCACAA CGGTTTCCCT 
GAGATATACC ATGGTACCAG ACACCGGAAA 

5 

AGACTCAGTG GACAGTGCTC CAATCCTCAC 
TAACTGTAAT AGTAACACTA CACCCATAGT 
ATGTTTAAGA TATAGATTTA AAAAGCATTG 
GCATTGGACA GGACATAATG TAAAACATAA 
TGAATGGCAA' CGTGACCAAT TTTTGTCTCA 

IS 

TACTGGATTT ATGTCTATAT GAGGATCCGG 
TGGCTGCTGC CACCGCTGAG CAATAACTAG 
TGAGGGGTTT TTTGCTGAAA GGAGGAACTA 
GCCATGATCG CGTAGTCGAT AGTGGCTCCA 
CCAAAGCGGT CGGACAGTGC TCCGAGAACG 
AGCGCTAGCA GCACGCCATA GTGACTGGCG 
AGGCCCGGCA GTACCGGCAT AACCAAGCCT 

30 

AGGATGACGA TGAGCGCATT GTTAGATTTC 
AACTGTGATA AACTACCGCA TTAAAGCTTA 



35 (2) INFORMATION FOR SEQ ID NO:15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 
40 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15 

CTCCCATGGT ACCAGACACC GGTiAACC 

SO 

(2) INFORMATION FOR SEQ ID NO: 16: 
(i) SEQUENCE CHARACTERISTICS: 

55 (A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



CTAGAAATAA TTTTGTTTAA CTTTAAGAAG 4080 

CCCCTGCCAC ACCACTAAGT TGTTGCACAG 4140 

TGCATTTAAC AGCTCACACA AAGGACGGAT 4200 

ACATTTAAAA GGTGATGCTA ATACTTTAAA 4260 

TACATTGTAT ACTGCAGTGT CGTCTACATG 4320 

AAGTGCAATT GTTACACTTA CATATGATAG 4380 

AGTTAAAATA CCAAAAACTA TTACAGTGTC 4440 

CTGCTAACAA AGCCCGAAAG GAAGCTGAGT 4500 

CATAACCCCT TGGGGCCTCT AAACGGGTCT 4560 

TATCCGGATA TCCACAGGAC GGGTGTGGTC 4620 

AGTAGCGAAG CGAGCAGGAC TGGGCGGCGG 4680 

GGTGCGCATA GAAATTGCAT CAACGCATAT 4740 

ATGCTGTCGG AATGGACGAT ATCCCGCAAG 4800 

ATGCCTACAG CATCCAGGGT GACGGTGCCG 4860 

ATACACGGTG CCTGACTGCG TTAGCAATTT 4920 

TCGATGATAA GCTGTCAAAC ATGAGAA 4977 
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(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: 

5 

GGGGGATCCT CATATAGACA TAAATCC 27 
(2) INFORMATION FOR SEQ ID NO: 17: 

10 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4977 base pairs 

(B) TYPE: nucleic acid 

IS (C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

2S AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

30 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

^ CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

40 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

SO 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

55 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 
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AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1O80 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTCTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

CGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

30 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AACCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

^ TCTGTGCGGT ATTTCACACC GCATATATGG TGCACTCTCA GTACAATCTG CTCTGATGCC 2100 

GCATAGTTAA GCCAGTATAC ACTCCGCTAT CGCTACGTGA CTGGGTCATG GCTGCGCCCC 2160 

GACACCCGCC AACACCCGCT GACGCGCCCT GACGGGCTTG TCTGCTCCCG GCATCCGCTT 2220 

40 

ACAGACAACC TGTGACCGTC TCCGGGAGCT GCATGTGTCA GAGGTTTTCA CCGTCATCAC 2280 

CGAAACGCGC GAGGCAGCTG CGGTAAAGCT CATCAGCGTG GTCGTGAAGC GATTCACAGA 2340 

TGTCTGCCTG TTCATCCGCG TCCAGCTCGT TGAGTTTCTC CAGAAGCGTT AATGTCrGGC 2400 

TTCTGATAAA GCGGGCCATG TTAAGGGCGG TTTTTTCCTG TTTGGTCACT TGATGCCTCC 2460 

GTGTAAGGGG GAATTTCTGT TCATGGGGGT AATGATACCG ATGAAACGAG AGAGGATGCT 2520 

SO 

CACGATACGG GTTACTGATG ATGAACATGC CCGGTTACTG GAACGTTGTG AGGGTAAACA 2580 

ACTGGCGGTA TGGATGCGGC GGGACCAGAG AAAAATCACT CAGGGTCAAT GCCAGCGCTT 2640 

55 CGTTAATAGA GATGTAGGTG TTCCACAGGG TAGCCAGCAG CATCCTGCGA TGCAGATCCG 2700 
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GAACATAATG GTGCAGCGCG CTGACTTCCG CGTTTCCAGA CTTTACGAAA CACGGAAACC 2760 

GAAGACCATT CATGTTGrXG CTCAGGTCGC AGACGTTTTG CAGCAGCAGT CGCTTCACGT 2820 

TCGCTCGCGT ATCGGTGATT CATTCTCCTA ACCAGTAAGG CAACCCCGCC AGCCTAGCCG 2880 

GGTCCTCAAC GACAGGAGCA CGATCATGCG CACCCGTGGC CAGGACCCAA CGCTGCCCGA 2940 

GATGCGCCGC GTGCGGCTGC TGGAGATGGC GGACGCGATG GATATGTTCT GCCAAGGGTT 3000 

GGTTTGCGCA TTCACAGTTC TCCGCAAGAA TTGATTGGCT CCAATTCTTG GAGTGGTGAA 3060 

TCCGTTAGCG AGGTGCCGCC GGCTTCCATT CAGGTCGAGG TGGCCCGGCT CCATGCACCG 3120 

CGACGCAACG CGGGGAGGCA GACAAGGTAT AGGGCGGCGC CTACAATCCA TGCCAACCCG 3180 

TTCCATGTGC TCX3CCGAGGC GGCATAAATC GCCGTGACGA TCAGCGGTCC AGTGATCGAA 3240 

GTTAGGCTGG TAAGAGCCGC GAGCGATCCT TGAAGCTGTC CCTGATGGTC GTCATCTACC 3300 

TGCCTGGACA GCATGGCCTG CAACGCGGGC ATCCCGATCC CGCCGCAAGC GAGAAGAATC 3360 

ATAATGGGGA AGGCCATCCiT GCCTCGCGTC GCGAACGCCA GCAAGACGTA GCCCAGCGCG 3420 

25 TCGGCCGCCA TGCCGGCGAT AATGGCCTGC TTCTCGCCGA AACGTTTGGT GGCGGGACCA 3480 

GTGACGAAGG CTTGAGCGAG GGCGTGCAAG ATTCCGAATA CCGCAAGCGA CAGGCCGATC 3540 

ATCGTCGCGC TCCAGCGAAA GCGGTCCTCG GCGAAAATGA CCCAGAGCGC TGCCGGCACC 3600 

30 

TGTCCTACGA GTTGCATGAT AAAGAAGACA GTCATAAGTG CGGCGACGAT AGTCATGCCC 3660 

CGCGCCCACC GGAAGGAGCT GACTGGGTTG AAGGCTCTCA AGGGCATCGG TCGACGCTCT 3720 

35 CCCTTATGCG ACTCCTGCAT TAGGAAGCAG CCCAGTAGTA GGTTGAGGCC GTTGAGCACC 3780 

GCCGCCGCAA GGAATGGTGC ATGCAAGGAG ATGGCGCCCA ACAGTCCCCC GGCCACGGGG 3840 

CCTGCCACCA TACCCACGCC GAAACAAGCG CTCATGAGCC CGAAGTGGCG AGCCCGATCT 3900 

40 

TCCCCATCGG TGATGTCGGC GATATAGGCG CCAGCAACCG CACCTGTGGC GCCGGTGATG 3960 

CCGGCCACGA TGCGTCCGGC GTAGAGGATC GAGATCTCGA TCCCGCGAAA TTAATACGAC 4020 

45 TCACTATAGG GAGACCACAA CGGTTTCCCT CTAGAAATAA TTTTGTTTAA CTTTAAGAAG 4080 

GAGATATACC ATGGTACCAG ACACCGGAAA CCCCTGCCAC ACCACTAAGT TGTTGCACAG 4140 

AGACTCAGTG GACAGTGCTC CAATCCTCAC TGCATTTAAC" AGCTCACACA AAGGACGGAT 4200 

50 

TAACTGTAAT AGTAACACTA CACCCATAGT ACATTTAAAA GGTGATGCTA ATACTTTAAG 4260 

ATCTTTAAGA TATAGATTTA AAAAGCATTC TACATTGTAT ACTGCAGTGT CGTCTACATG 4320 

GCATTGGACA GGACATAATG TAAAACATAA AAGTGCAATT GTTACACTTA CATATGATAG 4380 

55 
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TGAATGGCAA CGTGACCAAT TTTTGTCTCA AGTTAAAATA CCAAAAACTA TTACAGTGTC 4440 

TACTGGATTT ATGTCTATAT GAGGATCCGG CTGCTAACAA AGCCCGAAAG GAAGCTGAGT 4500 

5 

TGGCTGCTGC CACCGCTGAG CAATAACTAG CATAACCCCT TGGGGCCTCT AAACGGGTCT 4560 

TGAGGGGTTT TTTGCTGAAA GGAGGAACTA TATCCGGATA TCCACAGGAC GGGTGTGGTC 4620 

GCCATGATCG CGTAGTCGAT AGTGGCTCCA AGTAGCGAAG CGAGCAGGAC TGGGCGGCGG 4580 

CCAAAGCGGT CGGACAGTGC TCCGAGAACG GGTGCGCATA GAAATTGCAT CAACGCATAT 4740 

AGCGCTAGCA GCACGCCATA GTGACTGGCG ATGCTGTCGG AATGGACGAT ATCCCGCAAG 4800 

IS 

AGGCCCGGCA GTACCGGCAT AACCAAGCCT ATGCCTACAG CATCCAGGGT GACGGTGCCG 4860 

AGGATGACGA TGAGCGCATT GTTAGATTTC ATACACGGTG CCTGACTGCG TTAGCAATTT 4920 

^ AACTGTGATA AACTACCGCA TTAAAGCTTA TCGATGATAA GCTGTCAAAC ATGAGAA 49 77 



(2) INFORMATION FOR SEQ ID NO:18: 
2S (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
30 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 

35 

CGACACTGCA GTATACAATG TAGAATGCTT TTTAAATCTA TATCTTAAAG ATCTTAAAG 59 



(2) INFORMATION FOR SEQ ID NO:19: 

40 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

45 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

so (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: 



GCGTCGGCCG CCATGCCGGC GATAAT 2 6 

55 (2) INFORMATION FOR SEQ ID NO:20: 

(i) SEQUENCE CHARACTERISTICS: ^ 
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(A) LENGTH: 4819 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

TTCTTGAACA CXSAAAGGGCC TCGTGATACG CCTATTTTTA TACGTTAATC TCATCATAAT 60 

AATCC TT T C r TAGACGTCAC CTCCCACTTT TCCCCCAAAT GTCCCCCCAA CCCCTATTTC 120 

TTTATTTTTC TAAATACATT CAAATATCTA TCOGCTCATG AGACAATAAC CCTGAtAAAT ISO 

CCTTCAATAA TATTGAAAAA CCAAGAGTAT CACTATTCAA CArrXCCGTG TCXSCCCTTftT 240 

TCCCrrTTTT CCCCCATTTT GCCTTCCTCT TTTTCCTCAC CCAGAAACGC TCGTGAAAGT 300 

AAAAGAXGCT CAAGATCAGT TGGGTGCACG AGTGCCTTAC ATCGAACTGG ATCTCRACAG 360 

CG6TAAGATC CTTGAGAGTT TTCCCCCCCA ACAACCTtTT CCAATCATCA 6CACTTTTAA 420 

ACTTCICCTA TGTCGCGCGG TATTATCCC6 TCTTGAC6CC GGGCAACAGC AACTCGCTCG 480 

CCGCATACAC TATTCTCAGA AT6ACTTGCT TGAGTACTCA CCAGTCACA6 AAAAGCATCT 540 

TACCGATGCC ATGACACTAA CAGAATTATC CA6TCCTGCC ATAACCATGA GTGATAACAC 600 

^ TCCCGCCAAC TTACTTCTCA CAACCATCGG AGGACCGAAG GAGCTAACCC CTTTTTTGCA 660 

CAACaiTGCGG CATCATGTAA CTC3CCCTTCA TCGTTCCGAA CCCGACCTCA ATGAAGCCAT 720 

ACCftAACGAC GAGCGTGACA CCACGATCCC TGCAGCAATG GCAAOUVCXST TGCGCAAACT 780 

35 . 

ATTAACTGGC GAACTACTTA CTCTACCTTC CCX^CCAACAA TTAATAGACT GGATGGA6GC 840 

GGATAAACTT GCAGGACCAC RCTGCGCTC GGCCCTTCCC GCTG6CTGGX TTATT G CTGA 900 

TAAATCTGGA GCOGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCG7AG TTATCTACAC GACCGCGAG7 CAGGCAACTA TGGATCAACG 1020 

AAATAGACAG ATCX^CTGACA TAGGTGCCTC ACTCATTAAG CATTCGTAAC TGTCACACCA 1080 

45 

AGTTTACTCA TATATACTTT ACATTGATTT AAAACTTCAT TTTTAATTTA AAACGATCXA 1140 

GGTCAAGATC CTTTTTCATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTC6TTCCA 1200 

SO CTCACCGTCA GACCCCGTAG AAAAGATCAA AGCATCTTCT TGACATCXTT TTTTTCTCCG 1260 

CCTAATCTGC TCCMCCAAA CAAAAAAACC ACCCCTACCA GCCGTCGTTr GTTTGCCCGA 1320 

TCAACACCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1360 

55 
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TACTGTCCTT CTAGTGTAGC CGTAGTTAGG 
TACATACCTC GCTCTGCTAA TCCTGTTACC 

5 

TCTTACCGGG TTGGACTCAA GACGATAGTT 
GGGGGGTTCG TGCACACAGC CCAGCTTGGA 
ACAGCGTGAG CATTGAGAAA GCGCCACGCT 
G6TAAGCGGC AGGGTCGGAA CAGGAGAGCG 
GTATCTTTAT AGTCCTGTCG GGTTTCGCCA 
CTCGTCAGGG GGGCGGAGCC TATGGAAAAA 
GGCCTTTTGC TGGCCTTTTG CTCACATGTT 
TAACCGTATT ACCGCCTTTG AGTGAGCTGA 

20 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA 
TCTGXGCGGT ATTTCACACC GCATATATGG 
CCATAGTTAA GCCAGTATAC ACTCCGCTAT 
GACACCCGCC AACACCCGCT GACGCGCCCT 
ACAGACAAGC TGTGACCGTC TCCGGGAGCT 

30 

CGAAACGCGC GAGGCAGCTG CGGTAAAGCT 
TGTCTGCCTG TTCyVTCCGCG TCCAGCTCGT 
35 TTCTGATAAA GCGGGCCATG TTAAGCGCGG 

GTGTAAGGGG GAATTTCTGT TCATGGGGGT 
CACGATACGG GTTACTGATG ATGAACATGC 

40 

ACTGGCGGTA TGGATGCGGC GGGACCAGAG 
CGTTAATACA GATGTAGGTG TTCCACAGGG 
45 GAACATAATG GTGCAGGGCG CTGACTTCCG 

GAAGACCATT CATGTTGTTG CTCAGGTCGC 
TCGCTCGCGT ATCGGTGATT CATTCTGCTA 

SO 

GGTCCTCAAC GACAGGAGCA CGATCATGCG 
GATGCGCCGC GTGCGGCTGC TGGAGATGGC 
55 GGTTTGCGCA TTCACAGTTC TCCGCAAGAA 



CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GCGAACGACC TACACCGAAC TGAGATACCT 1620 

TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TGCACTCTCA GTACAATCTG CTCTGATGCC 2100 

CGCTACGTGA CTGGGTCATG GCTGCGCCCC 2160 

GACGGGCTTG TCTGCTCCCG GCATCCGCTT 2220 

GCATGTGTCA GAGGTTTTCA CCGTCATCAC 2280 

CATCAGCGTG GTCGTGAAGC GATTCACAGA 2340 

TGAGTTTCTC CAGAAGCCTT AATGTCTGGC 2400 

TTTTTTCCTG TTTGGTCACT TGATGCCTCC 2460 

AATGATACCG ATGAAACGAG AGAGGATGCT 2520 

CCGGTTACTG GAACGTTGTG AGGGTAAACA 2580 

AAAAATCACT CAGGGTCAAT GCCAGCGCTT 2640 

TAGCCAGCAG CATCCTGCGA TGCAGATCCG 2700 

CGTTTCCAGA CTTTACGAAA CACGGAAACC 2760 

AGACGTTTTG CAGCAGCAGT CGCTTCACGT 2820 

ACCACTAAGg' CAACCCCGCC AGCCTAGCCG 2880 

CACCCGTGGC CAGGACCCAA CGCTGCCCGA 2940 

GGACGCGATG GATATGTTCT GCCAAGGGTT 3000 

TTGATTGGCT CCAATTCTTG GAGTGGTGAA 3060 
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TCCGTTAGCG AGGTGCCGCC GGCTTCCATT 
CGACGC/VACG CGGGGAGGCA GACAAGGTAT 

5 

TTCCATGTGC TCGCCGAGGC GGCATAAATC 
GTTAGGCTGG TAAGAGCCGC GAGCGATCCT 
TGCCTGGACA GCATGGCCTG CAACGCGGGC 
ATAATGGGGA AGGCCATCCA GCCTCGCGTC 
TOGGCCGCCA TGCCGGCGAT AATGGCCTGC 

15 

GTGACGAAGG CTTGAGCGAG GGCGTGCAAG 
ATCGTCGCGC TCCAGCGAAA GCGGTCCTCG 
2^ TGTCCTACGA GTTGCATGAT AAAGAAGACA 

CGCGCCCACC GGAAGGAGCT GACTGGGTTG 
CCCTTATGCG ACTCCTGCAT TAGGAAGCAG 

25 

GCCGCCGCAA GGAATGGTGC ATGCAAGGAG 
CCTGCCACCA TACCCACGCC GAAACAAGCG 
TCCCCATCGG TGATGTCGGC GATATAGGCG 

30 

CCGGCCACGA TGCGTCCGGC GTAGAGGATC 
TCACTATAGG GAGACCACAA CGGTTTCCCT 

35 

GAGATATACA TATGGAACCG GTCGACCCGC 
AGCCGAAAAC CGCGTTCATC ACCAAAGCCC 
GTCAGCGACG TCGTCCGCCG CAGGGATCCC 

40 

GATCAGCATT GGCTAGCATG ACTGGTGGAC 
AAAGCCCGAA AGGAAGCTGA GTTGGCTGCT 
CTTGGGGCCT CTAAACGGGT CTTGAGGGGT 

TATCCACAGG ACGGGTGTGG TCGCCATGAT 
AGCGAGCAGG ACTGGGCGGC GGCCAAAGCG 

50 

TAGAAATTGC ATCAACGCAT ATAGCGCTAG 
GGAATGGACG ATATCCCGCA AGAGGCCCGG 
^ AGCATCCAGG GTGACGGTGC CGAGGATGAC 



CAGGTCGAGG TGGCCCGGCT CCATGCACCG 3120 

AGGGCGGCGC CTACAATCCA TGCCAACCCG 3180 

GCCGTGACGA TCAGCGGTCC AGTGATCGAA 3240 

TGAAGCTGTC CCTGATGGTC GTCATCTACC 3300 

ATCCCGATGC CGCCGGAAGC GAGAAGAATC 3360 

GCGAACGCCA GCAAGACGTA GCCCAGCGCG 3420 

TTCTCGCCGA AACGTTTGGT GGCGGGACCA 3480 

ATTCCGAATA CCGCAAGCGA CAGGCCGATC 3540 

CCGAAAATGA CCCAGAGCGC TGCCGGCACC 3600 

GTCATAAGTG CGGCGACGAT AGTCATGCCC 3660 

AAGGCTCTCA AGGGCATCGG TCGACGCTCT 3720 

CCCAGTAGTA GGTTGAGGCC GTTGAGCACC 3780 

ATGGCGCCCA ACAGTCCCCC GGCCACGGGG 3840 

CTCATGAGCC CGAAGTGGCG AGCCCGATCT 3900 

CCAGCAACCG CACCTGTGGC GCCGGTGATG 3960 

GAGATCTCGA TCCCGCGAAA TTAATACGAC 4020 

CTAGAAATAA TTTTGTTTAA CTTTAAGAAG 4080 

GTCTGGAACC ATGGAAACAC CCCGGGTCCC 4140 

TAGGTATCTC TTACGGCCGT AAAAAACGTC 4200 

AGACCCACCA GGTTTCTCTG TCTAAACAGT 4260 

AGCAAATGGG TCGCGGATCC GGCTGCTAAC 4320 

GCCACCGCTG AGC7VATAACT AGCATAACCC 4380 

TTTTTGCTGA AAGGAGGAAC TATATCCGGA 4440 

CGCGTAGTCG ATAGTGGCTC CAAGTAGCGA 4 500 

GTCGGACAGT GCTCCGAGAA CGGGTGCGCA 4560 

CAGCACGCCA TAGTGACTGG CGATGCTGTC 4620 

CACTACCGGC ATAACCAAGC CTATGCCTAC 4680 

GATGAGCGCA TTGTTAGATT TCATACACGG 4740 
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TGCCTGACTG CGTTAGCAAT TTAACTGTGA TAAACTACCG CATTAAAGCT TATCGATGAT 4800 
AAGCTGTCAA ACATGAGAA 4819 

5 

(2) INFORMATION FOR SEQ ID NO:21 : 
(I) SEQUENCE CHARACTERISTICS: 

10 (A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

IS (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 

20 TTTACGGCCG TAAGAGATAC CTAGGGCTTT GGTGATGAAC GCGGT 4 5 

. (2) INFORMATION FOR SEQ ID NO:22: 
2S (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5574 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
30 (D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

35 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGCTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 



40 



45 



50 



55 



52 
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CAACATGGGG GATCATGTAA CTCGCCTTGA 
ACCAAACGAC 6AGCGTGACA CCACGATGCC 

5 

ATTAACTGCC GAACTACTTA CTCTAGCTTC 
GGATAAAGTT GCAGGACCAC TTCTGCGCTC 
10 TAAATCTGCA GCC6GT6ACC GTGGGTCTCG 

TAAGCCCTCC CGTATCGTAG TTATCTACAC 
AAATAGACAG ATCGCTGAGA TAGGTGCCTC 

15 

AGTTTACTCA TATATACTTT AGATTGATTT 
GGTGAAGATC CTTTTTGATA ATCTCATGAC 
20 CTGAGCGTCA GACCCCGTAG AAAAGATCAA 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC 
TCAAGAGCTA CCAACTCTTT TTCCGAAGGT 

25 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG 
TACATACCTC GCTCTCCTAA TCCTGTTACC 
^ TCTTACCGGG TTGGACTCAA GACGATAGTT 

GGGCGGTTCG TGCACACAGC CCAGCTTGGA 
ACAGCGTGAG CATTGAGAAA GCGCCACGCT 

35 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG 
GTATCTTTAT AGTCCTGTCG GGTTTCGCCA 
CTCGTCAGGG GGGCGGAGCC TATGGAAAAA 

40 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT 
TAACCGTATT ACCGCCTTTG AGTGAGCTGA 
CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA 

TCTGTGCGGT ATTTCACACC GCATATATGG 
CCATACTTAA GCCAGTATAC ACTCCGCTAT 

SO 

GACACCCGCC AACACCCGCT GACGCGCCCT 
ACAGACAAGC TGTGACCGTC TCCGGGAGCT 
^ CGAAACGCGC GAGGCAGCTG CGGTAAAGCT 



TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 
TGCAGCAATG GCAACAACGT TGCGCAAACT 780 
CCGGCAACAA TTAATAGACT GGATGGAGGC 640 
GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 
CGGTATCATT GCAGCACTGG GGCCAGATGG 9 GO 

GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

ACCGGATAAG GCGCAGCG6T CGGGCTGAAC 1560 

GCGAACGACC TACACCGAAC TGAGATACCT 1620 

TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TACCGCTCGC CGCAGCCGAA CGACCGAGCX; .1980 

GCCCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TGCACTCTCA GTACAATCTG CTCTGATGCC 2100 

CGCTACGTGA CTGGGTCATG GCTGCGCCCC 2160 

GACGGCCTTG TCTGCTCCCG GCATCCGCTT 2220 

GCATGTGTCA GAGGTTTTCA CCGTCATCAC 2280 

CATCAGCGTG GTCGTGAAGC GATTCACAGA 2340 
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10 



IS 



20 



2S 



30 



35 



40 



45 



SO 



TGTCTGCCTG TTCATCCGCG TCCAGCTCGT TGAGTTTCTC 
TTCTGATAAA GCGGGCCATG TTAAGGGCGG TTTTTTCCTG 
CTGTAAGGGG GAATTTCTGT TCATGGGGGT AATGATACCG 
CACCATACGG GTTACTGATC ATGAACATGC CCGGTTACTG 
ACTGGCGGTA TGGATGCGGC GGGACCAGAG AAAAATCACT 
CGTtAATACA 6ATGTAGGTG TTCCACAGGG TAGCCACCAG 
GAACATAATG GTGCAGG6CG CTGACTTCCG CGTTTCCAGA 
GAAGACCATT CATGTTGTTG CTCAGGTCGC AGACGTTTTG 
TCGCTCGCGT ATCGGTGATT CATTCTCCTA ACCAGTAAGG 
GGTCCTCAAC GACAGGAGCA CGATCATGCG CACCCGTGGC 
GATGCGCCGC GTGCGGCTGC TGGAGATGGC GGACGCGATG 

GGTTTGCGCA TTCACAGTTC TCCGCAAGAA TTGATTGGCT 

/ 

TCCGTTAGCG AGGTGCCGCC GGCTTCCATT CAGGTCGAGG 
CGACGCAACG CGGGGAGGCA GACAAGGTAT AGGGCGGCGC 
TTCCATGTGC TCGCCGAGGC GGCATAAATC GCCGTGACGA 
GTTAGGCTGG TAAGAGCCGC GAGCGATCCT TGAAGCTGTC 
TGCCTGGACA GCATGGCCTG CAACGCGGGC ATCCCGATGC 
ATAATGGGGA AGGCCATCCA GCCTCGCGTC GCGAACGCCA 
TCGGCCGCCA TGCCGGCGAT AATGGCCTGC TTCTCGCCGA 
GTGACGAAGG CTTGAGCGAG GGCGTGCAAG ATTCCGAATA 
ATCGtCGCGC TCCAGCGAAA GCGGTCCTCG CCGAAAATGA 
TGTCCTACGA GTTGCATGAT AAAGAAGACA GTCATAACTG 
CGCGCCCACC GGAAGGAGCT GACTGGGTTG AAGGCTCTCA 
CCCTTATGCG ACTCCTGCAT TAGGAAGCAG CCCAGTAGTA 
GCCGCCGCAA GGAATGGTGC ATGCAAGGAG ATGGCGCCCA 
CCTGCCACCA TACCCACGCC GAAACAAGCG CTCATGAGCC 
TCCCCATCGG TGATGTCGGC GATATAGGCG CCAGCAACCG 
CCGGCCACGA TGCGTCCGGC GTAGAGGATC GAGATCTCGA 



CAGAAGCGTT 
TTTGGTCACT 
ATGAAACGAG 
GAACGTTGTG 
CAGGGTCAAT 
CATCCTGCGA 
CTTTACGAAA 
CAGCAGCAGT 
CAACCCCGCC 
CAGGACCCAA 
GATATGTTCT 
CCAATTCTTG 
TGCCCCGGCT 
CTACAATCCA 
TCAGCGGTCC 
CCTGATGGTC 
CGCCGGAAGC 
GCAAGACGTA 
AACGTTTGGT 
CCGCAAGCGA 
CCCAGAGCGC 
CGGCGACGAT 
AGGGCATCGG 
GGTTGAGGCC 
ACAGTCCCCC 
CGAAGTGGCG 
CACCTGTGGC 
TCCCGCGAAA 



AATGTCTGGC 
TGATGCCTCC 
AGAGGATGCT 
AGGGTAAACA 
GCCAGCGCTT 
TGCAGATCCG 
CACGGAAACC 
CGCTTCACGT 
AGCCTAGCCG 
CGCTGCCCGA 
GCCAAGGGTT 
GAGTGGTGAA 
CCATCCACCG 
TGCCAACCCG 
AGTGATCGAA 
GTCATCTACC 
GAGAAGAATC 
GCCCAGCGCG 
GGCGGGACCA 
CAGGCCGATC 
TGCCGGCACC 
AGTCATGCCC 
TCGACGCTCT 
GTTGAGCACC 
GGCCACGGGG 
AGCCCGATCT 
GCCGGTGATG 
TTAATACGAC 



2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 
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TCACTATAGG GAGACCACAA CGGTTTCCCT CTAGAAATAA TTTTGTTXAA CTTTAAGAAG 4080 

GAGATATACA TATGGAACCG GTCGACCCGC GTCTGGAACC ATGGAAACAC CCCGGGTCCC 4140 

5 

AGCCGAAAAC CGCGTTCATC ACCAAAGCCC TAGGTATCTC TTACGGCCGT AAAAAACGTC 4200 

GTCAGCGACG TCGTCCGCCG CAGGGATCTT CCATGGCCGG TGCTGGACGC ATTTACTATT 4260 

10 CTCGCTTTGG TGACGAGGCA GCCAGATTTA GTACAACAGG GCATTACTCT GTAAGAGATC 4320 

AGGACAGAGT GTATGCTGGT GTCTCATCCA CCTCTTCTGA TTTTAGAGAT CGCCCAGACG 4380 

GAGTCTGGGT CGCATCCGAA GGACCTGAAG GAGACCCTGC AGGAAAAGAA GCCGAGCCAG 4440 

IS 

CCCAGCCTGT CTCTTCTTTG CTCGGCTCCC CCGCCTGCGG TCCCATCAGA GCAGGCCTCG 4500 

GTTGGGTACG GGACGGTCCT CGCTCGCACC CCTACAATTT TCCTGCAGGC TCGGGGGGCT 4560 

^ CTATTCTCCG CTCTTCCTCC ACCCCGGTGC AGGGCACGGT ACCGGTGGAC TTGGCATCAA 4620 

GGCAGGAAGA AGAGGAGCAG TCGCCCGACT CCACAGAGGA AGAACCAGTG ACTCTCCCAA 4680 

GGCGCACCAC CAATGATGGA TTCCACCTGT TAAAGGCAGG AGGGTCATGC TTTGCTCTAA 4740 

25 ^ 

TTTCAGGAAC TGCTAACCAG GTAAAGTGCT ATCGCTTTCG GGTGAAAAAG AACCATAGAC 4800 

ATCCCTACGA GAACTGCACC ACCACCTGGT TCACAGTTGC TGACAACGGT GCTGAAAGAC 4860 

^ AAGGACAAGC ACAAATACTG ATCACCTTTG GATCGCCAAG TCAAAGGCAA GACTTTCTGA 4920 

AACATGTACC ACTACCTCCT GGAATGAACA TTTCCGGCTT TACAGCCAGC TTGGACTTCT 4980 

GATCACTGCC ATTGCCTTTT CTTCATCTGA CTGGTGTACT ATGCCAAATC TATGGTTTCT 5040 

35 

ATTGTTCTTG GGACTAGGAA GATCCGGCTG CTAACAAAGC CCGAAAGGAA GCTGAGTTGG 5100 

CTGCTGCCAC CGCTGAGCAA TAACTACCAT AACCCCTTGG GGCCTCTAAA CGGGTCTTGA 5160 

''^ GGGGTTTTTT GCTGAAAGGA GGAACTATAT CCGGATATCC ACAGGACGGG TGTGGTCGCC 5220 

ATGATCGCGT AGTCGATAGT GGCTCCAAGT AGCGAAGCGA GCAGGACTGG GCGGCGGCCA 5280 

AAGCGGTCGG ACAGTGCTCC CAGAACGGGT GCGCATAGAA ATTGCATCAA CGCATATAGC 5340 

45 

GCTAGCAGCA CGCCATAGTG ACTGGCGATG CTGTCGGAAT GGACGATATC CCGCAAGAGG 5400 

CCCGGCAGTA CCGGCATAAC CAAGCCTATG CCTACAGCAT CCAGGGTGAC GGTGCCGAGG 5460 

^ ATGACGATGA GCGCATTGTT AGATTTCATA CACGGTGCCT GACTGCGTTA CCAATTTAAC 5520 

TGTGATAAAC TACCGCATTA AAGCTTATCG ATGATAAGCT GTCAAACATG AGAA 5574 

S5 (2) INFORMATION FOR SEQ ID NO:23: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ill) HYPOTHETICAL: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 



GATCCCAGAC CCACCAGGTT 

(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 



GAACCTGGTG GGTCTGG 

(2) INFORMATION FOR SEQ ID NO:25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 

CGTCCGCCGC AGGGATCGCA GACCCACCAG GTTTCTCTGT CTAAACAGGC 

(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 58 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 



56 



EP0 656 950 B1 



CATGGCCTGT TTAGACAGAG AAACCTGGTG GGTCTGCGAT CCCTGCGGGG GACGACGT 58 

5 (2) INFORMATION FOR SEQ ID NO:27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 
10 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



IS 



20 



50 



55 



(ii) MOLECULE TYPE: DNA (genomic) 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO:27: 



CATGTACGGC CGTAAAAAAC GTCGTCAGCG ACCXCGTCCG CCGGACAC 



(2) INFORMATION FOR SEQ ID NO:28: 

(I) SEQUENCE CHARACTERISTICS: 

25 (A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

30 (jj) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 

^ CGGTGTCCGG CGGACGACGT CGCTGACGAC GTTTTTTACG GCCGTA 46 

(2) INFORMATION FOR SEQ ID NO:29: 
40 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
45 (D) TOPOLOGY: linear 

(II) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 



ATCATCG ATA AGCTTTAATG CGGTAG 2 6 

(2) INFORMATION FOR SEQ ID NO:30: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 52 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

5 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: 



ACTTTAAGAA GGAGATATAC ATATGTTCAT CACCAAAGCC CTAGGTATCT CT 



(2) INFORMATION FOR SEQ ID NO:31: 
IS (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: 

25 

ACTTTAAGAA GGAGATATAC ATATGTACGG CCGTAAAAAA CGTCGTCAGC G 
(2) INFORMATION FOR SEQ ID NO:32: 

30 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 base pairs 

(B) TYPE: nucleic acid 

35 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 

AACGTCGTCA GCGACGTCGT CCCCCGGACA CCGCAAACCC CTCCCACACC AC 

45 (2) INFORMATION FOR SEQ ID NO:33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 
so (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

55 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
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CGAAAAGTGC CACCTGACGT CTAAG AMCC 30 

(2) INFORMATION FOR SEQ ID NO:34: 

5 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

10 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 



CTCCCATGGC TAGCAACACT ACACCC 26 

20 

(2) INFORMATION FOR SEQ ID NO:35: 
(i) SEQUENCE CHARACTERISTICS: 

25 (A) LENGTH: 1 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(P) TOPOLOGY linear 

30 (ii) MOLECULE TYPE: DNA (genomic) 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO:35: 



35 



40 



45 



50 
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GAAGATCTTC 10 

(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 

CAGAGGAAGC CATGGTGACT CTCCCAA 27 

(2) INFORMATION FOR SEQ ID NO:37: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 27 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

5 (11) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 

AAGGCAATCG ATCCGATCAG AAGTCCA 2 7 

(2) INFORMATION FOR SEQ ID NO:38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1 34 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 



IS 



20 



25 



30 



Met Tyr Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Pro Asp Thr 
15 10 15 



Gly Asn Pro Cys His Thr Thr Lys Leu Leu. His Arg Asp Ser Val Asp 
20 25 30 



Ser Ala Pro He Leu Thr Ala Phe Asn Ser Ser His Lys Gly Arg lie 
^ 35 40 45 



Asn Cys Asn Ser Asn Thr Thr Pro He Val His Leu Lys Gly Asp Ala 
50 55 60 

40 

Asn Thr Leu Lys Cys Leu Arg Tyr Arg Phe Lys Lys His Cys Thr Leu 
65 70 75 80 



45 



50 



55 



Tyr Thr Ala Val Ser Ser Thr Trp His Trp Thr Gly His Asn Val Lys 
85 90 95 



His Lys Ser Ala He Val Thr Leu Thr Tyr Asp Ser Glu Trp Gin Arg 
100 105 110 
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Aop Gin Phe Leu Ser Gin Val Lys lie Pro Lya Thr XIq Thr Val 5er 
115 120 125 

5 

Ttsr Gly Phe Met: Ser He 
130 



10 (2) INFORMATION FOR SEQ ID NO:39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 55 base pairs 
IS (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: 



CATGTACGGC CCTAAAAAAC CTCGTCAGCG ACGTCGTCCG CT6AGTCAGC CCCAC 55 

25 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

30 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

35 

(ii) MOLECULE TYPE: DNA (genonDic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 

40 

CTCGGCCTGA CTCAGCGGA.C GACCTCSCTC ACGACGTTTT TTACGGCCGX A 51 

(2) INFORMATION FOR SEQ ID NO:41 : 
45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
SO (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 

55 

TCCTTCCTGT CCGCTGGTCA CCCCCCCCGC CCCCTCTCCA CCTAAG 46 
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(2) INFORMATION FOR SEQ ID NO:42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 

AATTCTTAGG TGGACAGGCG GCGCGGGCGC TGACCAGCGG ACAGGAAGGA CATG 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 



GGGGACTTTC CGCTGGGGAC TTTCCACGGG GGACTTTCC 



(2) INFORMATION FOR SEQ ID NO:44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 

GGAAAGTCCC CCGTGGAAAG TCCCCAGCGG AAAGTCCCC 

(2) INFORMATION FOR SEQ ID NO:45: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(il) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: . 

5 

GTCTACTTTC CGCTGTCTAC TTTCCACGGT CTACTTTCC 39 

(2) INFORMATION FOR SEQ ID NO:46: 
10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
IS (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 

20 

GGAAAGTAGA CCGTGGAAAG TAGACAGCGG AAAGTAGAC 39 



25 (2) INFORMATION FOR SEQ ID NO:47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 
30 (B)tYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

35 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 



Tyr Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro 
15 10 



(2) INFORMATION FOR SEQ ID NO:48: 
45 (t) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
so (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: 

55 
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Tyr Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Pro Gin Gly Ser 
15 10 15 

5 

Gin Thr His Gin Val Ser Leu Ser Lys Gin 
20 25 

(2) INFORMATION FOR SEQ ID NO:49: 

10 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 amino acids 

(B) TYPE: amino acid 

IS (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: 

Phe lie Thr Lys Ala Leu Gly lie Ser Tyr Gly Arg Lys Lys Arg Arg 
15 10 15 

25 

Gin Arg Arg Arg Pro Pro Gin Gly Ser Gin Thr His Gin Val Ser Leu 
20 25 30 

30 

Ser Lys Gin 
35 

(2) INFORMATION FOR SEQ ID NO:50: 

35 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 amino acids 

(B) TYPE: amino acid 

40 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(!i) MOLECULE TYPE: peptide 

45 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:50: ' 



Phe lie Thr Lys Ala Leu Gly lie Ser Tyr Gly Arg Lys Lys Arg Arg 
1 5 10 15 

50 



Gin Arg Arg Arg Pro 
20 

55 

(2) INFORMATION FOR SEQ ID NO:51 : 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 121 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:51: 



Pro Asp Thr Gly Asn Pro Cys His Thr Thr Lys Leu Leu His Arg Asp 
1 .5 10 15 



Ser Val Asp Ser Ala Pro lie Leu Thr Ala Phe Asn Ser Ser His Lys 
20 25 30 



Gly Arg He Asn Cys Asn Ser Asn Thr Thr Pro He Val His Leu Lys 
35 40 45 



Gly Asp Ala Asn Thr Leu Lys Cys Leu Arg Tyr Arg Phe Lys Lys His 
50 55 60 



Cys Thr Leu Tyr Thr Ala Val Ser Ser Thr Trp His Trp Thr Gly His 
65 70 7 5 80 



Asn Val Lys His Lys Ser Ala lie Val Thr Leu Thr Tyr Asp Ser Glu 
85 90 95 



Trp Gin Arg Asp Gin Phe Leu Ser Gin Val Lys He Pro Lys Thr He 
100 105 110 



Thr Val Ser Thr Gly Phe Met Ser He 
115 120 

(2) INFORMATION FOR SEQ ID NO:52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 



Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Pro Gin Gly Ser 
1 5 10 15 
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(2) INFORMATION FOR SEQ ID NO:53: 
(i) SEQUENCE CHARACTERISTICS: 

5 (A) LENGTH: 25 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: 



15 Phe lie Thr Lys Ala Leu Gly He Ser Tyr Gly Arg Lys Lys Arg Arg 

15 10 15 



Gin Arg Arg Arg Pro Pro Gin Gly Ser 
20 20 25 

(2) INFORMATION FOR SEQ ID NO:54: 

(i) SEQUENCE CHARACTERISTICS: 

25 

(A) LENGTH: 85 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

30 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: . 



3S 

Cys Asn Ser Asr^ Thr Thr Pro lie Val His Leu Lys Gly Asp Ala Asn 
1 5 10 15 



40 Thr Leu Lys Cys Leu Arg Tyr Arg Phe Lys Lys His Cys Thr Leu Tyr 

20 25 30 , 



Thr Ala val Ser Ser Thr Trp His Trp Thr Gly His Asn Val Lys His 
45 35 40 45 



Lys Ser Ala He Val Thr Leu Thr Tyr Asp Ser Glu Trp Gin Arg Asp 
50 55 60 

50 



55 
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Gin Phe Leu Ser Gin Val Lys lie Pro Lys Thr lie Thr Val Ser Thr 
65 70 75 80 

5 

Gly Phe Met Ser lie 
85 

(2) INFORMATION FOR SEQ ID NO:55: 

10 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1 21 amino acids 

(B) TYPE: amino acid 

IS (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:55: 



2S 



30 



Pro Asp Thr Gly Asn Pro Cys His Thr Thr Lys Leu Leu His Arg Asp 
15 10 15 



Ser Val Asp Ser Ala Pro lie Leu Thr Ala Phe Asn Ser Ser His Lys 
20 25 30 



Gly Arg lie Asn Cys Asn Ser Asn Thr Thr Pro . lie Val His Leu Lys 
35 40 45 



Gly Asp Ala Asn Thr Leu Lys Ser Leu Arg Tyr Arg Phe Lys Lys His 
50 55 60 



Ser Thr Leu Tyr Thr Ala Val Ser Ser Thr Trp His Trp Thr Gly His 
65 70 75 80 



Asn Val Lys His Lys Ser Ala He Val Thr Leu Thr Tyr Asp Ser Glu 
85 90 95 

45 



Trp Gin Arg Asp Gin Phe Leu Ser Gin Val Lys He Pro Lys Thr He 
100 105 110 

50 

Thr Val Ser Thr Gly Phe Met Ser He 
115 120 

55 (2) INFORMATION FOR SEQ ID NO:56: 
(i) SEQUENCE CHARACTERISTICS: 
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10 



IS 



(A) LENGTH: 161 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: 



Leu Gly Trp Val Arg Asp Gly Pro Arg Ser His pro Tyr Asn Phe Pro 
1 5 10 15 



Ala Gly Ser Gly Gly Ser lie Leu Arg Ser Ser Ser Thr Pro Val Gin 
20 25 30 



Gly Thr Val Pro Val Asp Leu Ala Ser Arg Gin Glu Glu Glu Glu Gin 
20 35 40 45 



25 



30 



35 



40 



45 



SO 



Ser Pro Asp Ser Thr Glu Glu Glu Pro Val Thr Leu Pro Arg Arg Thr 
50 55 60 



Thr Asn Asp Gly Phe His Leu Leu Lys Ala Gly Gly Ser Cys Phe Ala 
65 70 75 80 



Leu He Ser Gly Thr Ala Asn Gin Val Lys Cys Tyr Arg Phe Arg Val 
85 90 95 



Lys Lys Asn His Arg His Arg Tyr Glu Asn Cys Thr Thr Thr Trp Phe 
100 105 110 



Thr Val Ala Asp Asn Gly Ala Glu Arg Gin Gly Gin Ala Gin He Leu 
115 120 125 



He Thr Phe Gly Ser Pro Ser Gin Arg Gin Asp Phe Leu Lys His Val 
130 135 140 



Pro Leu Pro Pro Gly Met Asn He Ser Gly Phe Thr Ala Ser Leu Asp 
145 150 155 160 



Phe 



(2) INFORMATION FOR SEQ ID NO:57: 

SS 

(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 249 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

5 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: 

j0 Met Ala Gly Ala Gly Arg lie Tyr Tyr Ser Arg Phe Gly Asp Glu Ala 

15 10 15 



Ala Arg Phe Ser Thr Thr Gly His Tyr Ser Val Arg Asp Gin Asp Arg 
15 20 25 30 



Val Tyr Ala Gly Val Ser Ser Thr Ser Ser Asp Phe Arg Asp Arg Pro 
35 40 45 

20 

Asp Gly Val Trp Val Ala Ser Glu Gly Pro Glu Gly Asp Pro Ala Gly 
50 55 60 



25 



30 



35 



40 



45 



SO 



55 



Lys Glu Ala Glu Pro Ala Gin Pro Val Ser Ser Leu Leu Gly Ser Pro 
65 70 75 80 



Ala Cys Gly Pro lie Arg Ala Gly Leu Gly Trp Val Arg Asp Gly Pro 
85 90 95 



Arg Ser His Pro Tyr Asn Phe Pro Ala Gly Ser Gly Gly Ser lie Leu 
100 105 110 



Arg Ser Ser Ser Thr Pro Val Gin Gly Thr Val Pro Val Asp Leu Ala 
115 120 125 



Ser Arg Gin Glu Glu Glu Glu Gin Ser Pro Asp Ser Thr Glu Glu Glu 
130 135 140 



Pro Val Thr Leu Pro Arg Arg Thr Thr Asn Asp Gly Phe His Leu Leu 
145 150 155 160 



Lys Ala Gly Gly Ser Cys Phe Ala Leu lie Ser Gly Thr Ala Asn Gin 
165 170 175 
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IS 



20 



25 



30 



35 



40 



45 



50 



55 



Val Lys Cys Tyr Arg Phe Arg Val Lys Lys Asn His Arg His Arg Tyr 
180 185 190 



Glu Asn Cys Thr Thr Thr Trp Phe Thr Val Ala Asp Asn Gly Ala Glu 
195 200 205 



Arg Gin Gly Gin Ala Gin lie Leu lie Thr Phe Gly Ser Pro Ser Gin 
210 215 220 



Arg Gin Asp Phe Leu Lys His Val Pro Leu Pro Pro Gly Met Asn lie 
225 230 235 240 



Ser Gly Phe Thr Ala Ser Leu Asp Phe 
245 



(2) INFORMATION FOR SEQ ID NO:58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 385 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: 
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Met Tyr Giy Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Leu Ser Gin 
15 10 15 



Ala Gin Leu Met Pro Ser Pro Pro Met Pro Vai Pro Pro Ala Ala Leu 
20 25 30 



10 



Phe Asn Arg Leu Leu Asp Asp Leu Gly Phe Ser Ala Gly Pro Ala Leu 
35 40 45 



Cys Thr Met Leu Asp Thr Trp Asn Glu Asp Leu Phe Ser Gly Phe Pro 
50 55 60 



20 



Thr Asn Ala Asp Met Tyr Arg Glu Cys Lys Phe Leu Ser Thr Leu Pro 
65 70 75 80 



Ser Asp Val He Asp Trp Gly Asp Ala His Val Pro Glu Arg Ser Pro 
85 90 95 



25 



30 



35 



40 



45 



50 



55 
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lie Asp lie Arg Ala His Gly Asp Val Ala Phe Pro Thr Leu Pro Ala 
100 105 110 



Thr Arg Asp Glu Leu Pro Ser Tyr Tyr Glu Ala Met Ala Gin Phe Phe 
115 120 125 



Arg Gly Glu Leu Arg Ala Arg Glu Glu Ser Tyr Arg Thr Val Leu Ala 
130 135 140 



Aen Phe Cys Ser Ala Leu Tyr Arg Tyr Leu Arg Ala Ser Val Arg Gin 
145 150 155 160 



Leu His Arg Gin Ala His Met Arg Gly Arg Asa Arg Asp Leu Arg Glu 
165 170 175 



Met Leu Arg Thr Thr lie Ala Asp Arg Tyr Tyr Arg Glu Thr Ala Arg 
leO 185 190 



Lgu Ala Arg Val Leu Phe Leu His Leu Tyr Leu Phe Leu Ser Arg Glu 
195 200 205 



lie Leu Trp Ala Ala Tyr Ala Glu Gin Met Met Arg Pro Asp Leu Phe 
210 215 220 



Asp Gly Leu Cys Cys Asp Leu Glu Ser Trp Arg Gin Leu Ala Cys Leu 
225 230 235 240 



Phe Gin Pro Leu Met Phe lie Asn Gly Ser Leu Thr Val Arg Gly Val 
245 250 255 



Pro Val Glu Ala Arg Arg Leu Arg Glu Leu Asn His lie Arg Glu His 
260 265 270 



Leu Asn Leu Pro Leu Val Arg Ser Ala Ala Ala Glu Glu Pro Gly Ala 
275 280 285 



Pro Leu Thr Thr Pro Pro Val Leu Gin Gly Asn Gin Ala Arg Ser Ser 
290 295 300 



Gly Tyr Phe Met Leu Leu lie Arg Ala Lys Leu Asp Ser Tyr Ser Ser 
305 310 315 320 
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val Ala Thr Ser Clu Gly Glu Ser Val Met Arg Glu His Ala Tyr Ser 
325 330 335 



Arg Gly Arg Thr Arg Asn Asn Tyr Gly Ser Thr lie Glu Gly Leu Leu 
340 345 350 



Asp Leu Pro Asp Asp Asp Asp Ala Pro Ala Glu Ala Gly Leu Val Ala 
355 360 365 



Pro Arg Met Ser Phe Leu Ser Ala Gly Gin Arg Pro Arg Arg Leu Ser 
370 375 380 



Thr 
385 



INFORMATION FOR SEQ ID NO:59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 148 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: 



Met Tyr Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Pro Gin Gly 
15 10 15 



Ser Gin Thr His Gin Val Ser Leu Ser Lys Gin Pro Asp Thr Gly Asn 
20 25 30 



Pro Cys His Thr Thr Lys Leu Leu His Arg Asp Ser Val Asp Ser Ala 
35 40 45 



Pro lie Leu Thr Ala Phe Ash Ser Ser His Lys Gly Arg lie Asn Cys 
50 55 60 



Asn Ser Asn Thr Thr Pro He Val His Leu Lys Gly Asp Ala Asn Thr 
65 70 75 80 
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Leu Lys Cys Leu Arg Tyr Arg Phe Lys Lys His Cys Thr Leu Tyr Thr 
85 90 95 



Ala Val Ser Ser Thr Trp His Trp Thr Gly His Asn Val Lys His Lys 
100 105 110 



Ser Ala lie Val Thr Leu Thr Tyr Asp Ser Glu Trp Gin Arg Asp Gin 
115 120 125 



Phe Leu Ser Gin Val Lys He Pro Lys Thr He Thr Val Ser Thr Gly 
,5 X30 135 140 

Phe Met Ser He 
145 

20 

(2) INFORMATION FOR SEQ ID NO:60: 
(I) SEQUENCE CHARACTERiSTICS: 

25 (A) LENGTH: 157 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:60: 

55 Met Phe He Thr Lya Ala Leu Gly He Ser Tyr Gly Arg Lys Lys Arg 

15 10 15 



Arg Gin Arg Arg Arg Pro Pro Gin Gly Ser Gin Thr His Gin Val Ser 
40 20 25 30 



Leu Ser Lys Gin Pro Asp Thr Gly Asn Pro Cys His Thr Thr Lye Leu 
35 40 45 

45 

Leu His Arg Asp Ser Val Asp Ser Ala Pro He Leu Thr Ala Phe Asn 
50 55 60 



50 



55 



Ser Ser His Lys Gly Arg He Asn Cys Asn Ser Asn Thr Thr Pro He 
65 70 7 5 80 



Val His Leu Lys Gly Asp Ala Asn Thr Leu Lys Cys Leu Arg Tyr Arg 
85 90 95 
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Phe Lya Lys His Cys Thr Leu Tyr Thr Ala Val Ser Ser Thr Trp His 
100 105 110 



Trp Thr Gly His Asn Val Lye His Lya Ser Ala He Val Thr Leu Thr 
115 120 125 



Tyr Asp Ser Clu Trp Gin Arg Asp Gin Phe Leu Ser Gin Val Lys He 
130 135 140 



Pro Lya Thr He Thr Val Ser Thr Gly Phe Met Ser He 
IS 145 150 155 

(2) INFORMATION FOR SEQ ID NO:61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 177 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:61 : 



20 



25 



30 



Met Gly Arg Lys Lya Arg Arg Gin Arg Arg Arg Pro Pro Gin Gly Ser 
15 10 15 



35 Leu Gly Trp Val Arg Asp Gly Pro Arg Ser His Pro Tyr Asn Phe Pro 

20 25 30 



Ala Gly Ser Gly Gly Ser He Leu Arg Ser Ser Ser Thr Pro Val Gin 
40 35 40 45 



45 



SO 



Gly Thr Val Pro Val Asp Leu Ala Ser Arg Gin Glu Glu Glu Glu Gin 
50 55 60 



Ser Pro Asp Ser Thr Glu Glu Glu Pro Val Thr Leu Pro Arg Arg Thr 
65 70 75 80 



Thr Asn Asp Gly Phe His Leu Leu Lys Ala Gly Gly Ser Cys Phe Ala 
85 90 95 



55 
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10 



15 



20 



Leu lie Ser Gly Thr Ala Asn Gin Val Lys Cye Tyr Arg Phe Arg Val 
100 105 110 



Lys Lys Asn Hie Arg Hia Arg Tyr Glu Asn Cys Thr Thr Thr Trp Phe 
115 120 125 



Thr Val Ala Aap Asn Gly Ala Glu Arg Gin Gly Gin Ala Gin lie Leu 
130 135 140 



lie Thr Phe Gly Ser Pro Ser Gin Arg Gin Asp Phe Leu Lys His Val 
145 150 155 160 



Pro Leu Pro Pro Gly Met Asn He Ser Gly Phe Thr Ala Ser Leu Asp 
165 170 175 



Phe 

2S (2) INFORMATION FOR SEQ ID NO:62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 187 amino acids 
30 (B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

35 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:62: 

Met Phe lie Thr Lys Ala Leu Gly lie Ser Tyr Gly Arg Lys Lys Arg 
40 1 5 10 15 



Arg Gin Arg Arg Arg Pro Pro Gin Gly Ser Leu Gly Trp Val Arg Asp 
20 25 30 

45 

Gly Pro Arg Ser His Pro Tyr Asn Phe Pro Ala Gly Ser Gly Gly Ser 
35 40 45 



50 



lie Leu Arg Ser Ser Ser Thr Pro Val Gin Gly Thr Val Pro Val Asp 
50 55 60 



55 



Leu Ala Ser Arg Gin Glu Glu Glu Glu Gin Ser Pro Asp Ser Thr Glu 
65 70 75 80 
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Glu Glu Pro Val Thr Leu Pro Arg Arg Thr Thr Asn Asp Gly Phe His 
85 90 95 

Leu Leu Lys Ala Gly Gly Ser Cys Phe Ala Leu He Ser Gly Thr Ala 
100 105 110 

Asn Gin Val Lys Cys Tyr Arg Phe Arg Val Lye Lys Asn His Arg His 
115 120 125 

Arg Tyr Glu Asn Cys Thr Thr Thr Trp Phe Thr Val Ala Asp Asn Gly 
130 135 140 



Ala Glu Arg Gin Gly Gin Ala Gin He Leu He Thr Phe Gly Ser Pro 
145 150 155 160 

20 

ser Gin Arg Gin Asp Phe Leu Lys His Val Pro Leu Pro Pro Gly Met 
165 170 175 

25 

Aen He Ser Gly Phe Thr Ala Ser Leu Asp Phe 
180 185 

30 (2) INFORMATION FOR SEQ ID NO:63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1 43 amino acids 
35 (B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

40 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:63: 



45 



Met Phe He Thr Lys Ala Leu Gly lie Ser Tyr Gly Arg Lys Lys Arg 



10 



15 



50 



Arg Gin Arg Arg Arg Pro Pro Asp Thr Gly Asn Pro Cys His Thr Thr 
20 25 30 



Lys Leu Leu His Arg Asp Ser Val Asp Ser Ala Pro He Leu Thr Ala 
35 40 45 



55 
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10 



15 



20 



25 



Phe Asn Ser Ser His Lys Gly Arg lie Asn Cys Asn Ser Asn Thr Thr 
50 55 60 



Pro lie Val His Leu Lys Gly Asp Ala Asn Thr Leu Lys Cys Leu Arg 
65 70 75 80 



Tyr Arg Phe Lys Lys His Cys Thr Leu Tyr Thr Ala Val Ser Ser Thr 
85 90 95 



Trp His Trp Thr Gly His Asn Val Lya His Lys Ser Ala lie Val Thr 
100 105 110 



Leu Thr Tyr Asp Ser Glu Trp Gin Arg Asp Gin Phe Leu Ser Gin Val 
115 120 125 



Lys lie Pro Lys Thr lie Thr Val Ser Thr Gly Phe Met Ser lie 
130 135 140 



Claims 

30 1 . A fusion protein consisting of a carboxy-terminal cargo moiety and an amino-terminal transport moiety, wherein 

(a) the transport moiety is characterized by: 

(i) the presence of amino acids 49-57 of HIV tat protein; 
35 (ii)theabsenceof amino acids 22-36 of HIV tat protein; and 

(iii) the absence of amino acids 73-86 of HIV tat protein; and 

(b) the cargo moiety retains biological activity following transport nrK>iety-dependent intracellular delivery. 

40 2. The fusion protein according to claim 1 , wherein the cargo moiety is selected from the group consisting of thera- 
peutic molecules, prophylactic molecules and diagnostic molecules. 

3. The fusion protein according to claim 1 or 2. wherein the cargo moiety consists of human papillomavirus E2 re- 
pressor and the transport moiety is selected from the group consisting of: 



45 



SO 



(a) amino acids 47-58 of HIV tat protein (SEQ ID NO 

(b) amino acids 47-72 of HIV tat protein (SEQ ID NO 

(c) amino acids 38-72 of HIV tat protein (SEQ ID NO 

(d) amino acids 38-58 of HIV tat protein (SEQ ID NO 



47) ; 

48) ; 

49) ; and 

50) . 



4. The fusion protein according to any one of claims 1 to 3, wherein the cargo moiety consists of amino acids 245-365 
of the human papillomavirus E2 protein (SEQ ID NO: 51 ). 

5. The fusion protein according to claim 4 selected from the group consisting of JB106 having SEQ ID NO: 38, JB117 
55 having SEQ ID NO: 59, JB118 having SEQ ID NO: 60. JB122 having SEQ ID NO: 63. 

6. The fusion protein according to claim 1 or 2, wherein the cargo moiety consists of a bovine papillomavirus E2 
repressor and the transport moiety is selected from the group consisting of: 
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(a) amino acids 47-62 of HIV tat protein (SEQ ID NO: 52); and 

(b) amino acids 38-62 of HIV tat protein (SEQ ID NO: 53). 

7. The fusion protein according to any one of claims 1 , 2 or 6, wherein the cargo moiety is an E2 repressor consisting 
s of amino acids 250-410 of the bovine papillomavirus E2 protein (SEQ ID NO: 56). 

8. The fusion protein according to claim 7 which is JB119 having SEQ ID NO: 61 or JB120 having SEQ ID NO: 62. 

9. The fusion protein of claim 1 or 2, wherein the cargo moiety consists of amino acids 43-41 2 of HSV VP1 6 protein 
10 and the transport moiety consists of amino acids 47-58 of HIV tat protein. 

10. The fusion protein according to any one of claims 1 to 9. wherein the transport moiety is preceded by an amino- 
terminal methionine. 

15 11. A DNA molecule comprising a nucleotide sequence encoding a fusion protein according to claim 5 or 8. 

12. A DNA molecule comprising a nucleotide sequence encoding fusion protein tat-VPl6R.GF having SEQ ID NO: 58. 

13. The DNA molecule according to claim 11 or 12. wherein the nucleotide sequence encoding the fusion protein is 
20 operatively linked to expression control sequences. 

14. A unicellular host transformed with a DNA molecule according to claim 13. 

15. A method for producing a fusion protein according to any one of claims 5, 8 or 9 comprising the steps of: 

25 

(a) culturing a transformed unicellular host according to claim 14; and 

(b) recovering the fusion protein from said culture. 

16. A covalently linked chemical conjugate consisting of a transport polypeptide moiety and a cargo moiety, wherein: 

30 

(a) the transport polypeptkJe moiety of the conjugate is characterized by: 

(i) the presence of amino acids 49-57 of HIV tat protein; 

(ii) the absence of amino acids 22-36 of HIV tat protein; and 
35 (iii) the absence of amino acids 73-86 of HIV tat protein; and 

(b) the cargo moiety of the conjugate retains biological activity following transport moiety<fependent intracel- 
lular delivery. 

40 17. The covalently linked chemical conjugate according to claim 1 6, wherein the transport polypeptide moiety consists 
of amino acids 37-72 of HIV tat protein (SEQ ID NO: 2). 

18. The covalently linked chemical conjugate according to claim 17, wherein the cargo moiety is selected from the 
group consisting of: 

45 

(a) amino acids 245-365 of human papillomavirus E2 protein (SEQ ID NO: 51); and 

(b) amino acids 245-365 of human papillomavirus E2 protein, wherein amino acids 300 and 309 have been 
changed to cysteine (SEQ ID NO: 55). 

50 19. The covalently linked chemical conjugate according to claim 17, wherein the cargo moiety is a double-stranded 
DNA selected from the group consisting of 

(a) oligonucleotide NF1 having SEQ ID NO: 43 annealed to oligonucleotide NF2 having SEQ ID NO: 44 and 

(b) oligonucleotide NF3 having SEQ ID NO: 45 annealed to oligonucleotide NF4 having SEQ ID NO: 46. 

55 

20. A pharmaceutical composition comprising a pharmaceutically effective amount of a fusion protein according to 
any one of claims 1 to 10 or a covalently linked chemical conjugate according to any one of claims 16 to 1 9. 
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21. Use of a fusion protein according to any one of claims 1 to 1 0 or a covalentty linked chemical conjugate according 
to any one of claims 16 to 19 for the preparation of a pharmaceutical composition for the Intracellular delivery of 
cargo. 

5 22. A method for producing a fusion protein consisting of a carboxy-terminal cargo moiety and an amino-terminal 
transport moiety, characterized by the step of genetically fusing 

(a) a transport moiety that is characterized by: 

10 (i) the presence of amino acids 49-57 of HIV tat protein; 

(ii) the absence of amino acids 22-36 of HIV tat protein; and 
(lit) the absence of amino acids 73-86 of HIV tat protein; and 

(b) a cargo moiety that retains biological activity following transport moiety-dependent intracellular delivery. 

15 

23. The method according to claim 22, wherein the cargo moiety is selected from the group consisting of therapeutic 
molecules, prophylactic molecules and diagnostic molecules. 

24. The method according to claim 22 or 23, wherein the cargo moiety consists of human papillomavirus E2 repressor 
20 and the transport moiety is selected from the group consisting of: 

(a) amino acids 47-58 of HIV tat protein (SEQ ID NO: 47); 

(b) amino acids 47-72 of HIV tat protein (SEQ ID NO: 48); 

(c) amino acids 38-72 of HIV tat protein (SEQ ID NO: 49); and 
25 (d) amino acids 38-58 of HIV tat protein (SEQ ID NO: 50). 

25. The method according to any one of claims 22 to 24, wherein the cargo moiety consists of amino acids 245-365 
of the human papillomavirus E2 protein (SEQ ID NO: 51). 

30 26. The method according to claim 25, wherein said fusion protein is selected from the group consisting of JB106 
having SEQ ID NO: 38, JB117 having SEQID NO: 59, JB118 having SEQ ID NO: 60, JB122 having SEQ ID NO: 63. 

27. The method according to claim 22 or 23, wherein the cargo moiety consists of a bovine papillomavirus E2 repressor 
and the transport moiety is selected from the group consisting of; 

35 , 

(a) amino acids 47-62 of HIV tat protein (SEQ ID NO: 52); and 

(b) amino acids 38-62 of HIV tat protein (SEQ ID NO: 53). 

28. The method according to any one of claims 22, 23 or 27, wherein the cargo moiety is an E2 repressor consisting 
40 of amino acids 250-410 of the bovine papillomavirus E2 protein (SEQ ID NO: 56). 

29. The method according to claim 28, wherein said fusion protein is JB1 19 having SEQ ID NO: 61 or JB120 having 
SEQ ID NO: 62. 

45 30. The method of claim 22 or 23, wherein the cargo moiety consists of amino acids 43-41 2 of HSV VP16 protein and 
the transport moiety consists of amino acids 47-58 of HIV tat protein. 

31 . The method according to any one of claims 22 to 30, wherein the transport moiety is preceded by an amino^erminal 

methionine. 

50 

32. A method for producing a DNA molecule comprising a nucleotide sequence encoding a fusion protein consisting 
of a carboxy-terminal cargo moiety and an amino-terminal transport moiety comprising the step of introducing Into 
a plasmid a nucleotide sequence encoding a fusion protein produced by the method according to claim 26 or 29. 

55 33. A method for producing a DNA molecule comprising a nucleotide sequence encoding a fusion protein consisting 
of a carboxy-terminal cargo moiety and an amino-terminal transport moiety comprising the step of introducing into 
a plasmid a nucleotide sequence encoding fusion protein tal-VP16R.GF having SEQ ID NO: 58. 
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34. The method according to claim 32 or 33, wherein the nucleotide sequence encoding the fusion protein is operatively 
linked to expression control sequences. 

35. A method for transforming a unicellular host comprising the step of introducing into said host a DNA molecule 

s produced by the method according to claim 34. 

36. A method for producing a fusion protein according to any one of claims 26, 29 or 30 comprising the steps of: 

(a) culturing a transformed unicellular host produced by the method according to claim 35; and 
10 (b) recovering the fusion protein from said culture. 

37. A method for producing a covalently linked chemical conjugate consisting of a transport polypeptide moiety and 
a cargo moiety, comprising the step of linking of: 

15 (a) a transport polypeptide moiety that is characterized by: 

(i) the presence of amino acids 49-57 of HIV tat protein; 

(ii) the absence of amino acids 22-36 of HIV tat protein; and 

(iii) the absence of amino acids 73-86 of HIV tat protein; and 

20 

(b) a cargo moiety that retains bbiogical activity folbwing transport moiety-dependent intracellular delivery. 

38. The method according to claim 37, wherein the transport polypeptide nrvoiety consists of amino acids 37-72 of HIV 
tat protein (SEQ ID NO: 2)} 

25 

39. The method according to claim 38, wherein the cargo moiety is selected from the group consisting of: 

(a) amino acids 245-365 of human papillomavirus E2 protein (SEQ ID NO: 51); and 

(b) amino acids 245-365 of human papiltomavirus E2 protein, wherein amino acids 300 and 309 have been 
30 changed to cysteine (SEQ ID NO: 55). 

40. The method according to claim 38, wherein the cargo moiety is a double-stranded DNA selected from the group 
consisting of 

35 (a) oligonucleotide NF1 having SEQ ID NO: 43 annealed to oligonucleotide NF2 having SEQ ID NO: 44 and 

(b) oligonucleotide NF3 having SEQ ID NO: 45 annealed to oligonucleotide NF4 having SEQ ID NO: 46. 

41. A method for the preparation of a pharmaceutical composition comprising a pharmaceutically effective amount of 
a fusion protein produced by the method according to any one of claims 22 to 31 or a covalently linked chemical 

40 conjugate produced by the method according to any one of claims 37 to 40, wherein said fusion protein or said 

covalently linked chemical conjugate is formulated with a pharmaceutically acceptable carrier 



Patentanspruche 

45 

1 . Fusionsprotein, bestehend aus einer carboxyterminalen Frachteinheit und einer aminoterminalen Transportetnheit, 
wobei 

(a) die Transportelnhelt gekennzeichnel 1st durch: 

50 

(i) das Vorhandensein der Aminosauren 49 bis 57 des HlV-tat-Proteins; 

(II) das Fehlen der Aminosauren 22 bis 36 des HI V-tat-Protelns; und 

(III) das Fehlen der Aminosauren 73 bis 86 des HIV-tat-Proteins; und 

55 (b) die Frachteinheit die biologische Aktivltat nach der Transporleinhelt-abhangigen intrazellularen Abllef erung 

behalt. 

2. Fusionsprotein nach Anspruch 1 , wobei die Frachteinheit ein therapeutisches, prophylaktisches oder diagnosti- 
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10 



15 



sches Molekui ist. 

3. Fuslonsprotein nach Anspruch 1 oder 2, wobel die Frachteinheit aus dem menschlichen Papillomavirus E2-Re- 
pressor besteht und die Transporteinlieit: 

(a) die Aminosauren 47 bis 58 des HI V-tat- Proteins (SEQ ID NR. 47); 

(b) die Aminosauren 47 bis 72 des HI V-tat- Proteins (SEQ ID NR: 48); 

(c) die Aminosauren 38 bis 72 des HIV-tat-Proteins (SEQ ID NR: 49); Oder 

(d) die Aminosauren 38 bis 58 des HIV-tat-Proteins (SEQ ID NR: 50) 



aufweist. 

4. Fusionsprotein nach einem der AnsprOche 1 bis 3, wobei die Frachteinheit aus den Aminosauren 245 bis 365 des 
menschlichen Papillomavirus E2-Proteins (SEQ ID NR: 51) besteht. 

5. Fusionsprotein nach Anspruch 4. ausgewahtt aus JB106, das die Sequenz der SEQ ID NR: 38 aufweist, JB117, 
das die Sequenz der SEQ ID NR: 59 aufweist, JB118, das die Sequenz der SEQ ID NR: 60 aufweist, oder JB122, 
das die Sequenz der SEQ ID NR: 63 aufweist. 

20 6. Fusionsprotein nach Anspruch 1 oder 2, wobei die Frachteinheit aus dem Rinderpapillomavirus E2-Repressor 
besteht und die Transporteinheit: 

(a) die Aminosauren 47 bis 62 des HIV-tat-Proteins (SEQ ID NR: 52); oder 

(b) die Aminosauren 38 bis 62 des HIV-tat-Proteins (SEQ ID NR: 53) 

25 

aufweist 

7. Fusionsprotein nach einem der AnsprOche 1 , 2 oder 6, wobei die Frachteinheit ein E2-Repressor bestehend aus 
den Aminosauren 250 bis 410 des Rinderpapillomavirus E2-Proteins (SEQ ID NR: 56) ist. 

30 

8. Fusionsprotein nach Anspruch 7, das JB11 9, das die Sequenz der SEQ ID NR: 61 aufweist oder JB1 20. das die 
Sequenz der SEQ ID NR: 62 aufweist, ist. 

9. Fusionsprotein nach Anspruch 1 oder 2, wobei die Frachteinheit aus den Aminosauren 43 bis 412 des HSV- 
35 VP16-Proteins und die Transporteinheit aus den Aminosauren 47 bis 58 des HIV-tat-Proteins besteht. 

10. Fusionsprotein nach einem der AnsprOche 1 bis 9, wobei der Transporteinheit ein aminoterminales Methionin 
vorausgeht. 

40 11. DNA-Molekul, umfassend eine Nucleotidsequenz, die ein Fusionsprotein nach Anspruch 5 oder 8 codiert. 

12. DNA-Molekul, umfassend eine Nucleotidsequenz, die ein Fusionsprotein tat-VP16R.GF oodiert, das die Sequenz 
der SEQ ID NR: 58 aufweist. 

45 13. DNA-Molekul nach Anspruch 11 oder 12, wobei die Nucleotidsequenz. die das Fusbnsprotein codiert. funktionell 
mit Expressionskontrollsequenzen verbunden ist 

14. Einzelliger Wirt, der mit einem DNA-MolekOl nach Anspruch 13 transformiert ist 

50 15. Verfahren zur Herstellung eines Fusionsproteins nach einem der AnsprOche 5, 8 oder 9, umfassend die Sohritte: 

(a) ZOchtung eines transformierten einzelligen Wirts nach Anspruch 1 3; und 

(b) Gewinnung des Fusionsproteins aus der Kultur. 

ss 16. Kovatenl verknupfles chemisches Konjugat, bestehend aus einer Transportpolypeptideinheit und einer Frachtein- 
heit, wobei: 

(a) die Transportpolypeptideinheit des Konjugats gekennzeichnet ist durch: 
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(i) das Vorhandensein der Aminosauren 49 bis 57 des HIV-tat-Proteins; 

(ii) das Fehlen der Aminosauren 22 bis 36 des HIV-tat-Proteins; und 
(ill) das Fehlen der Aminosauren 73 bis 86 des HIV-tat-Proteins; und 

5 (b) die Frachteinheit des Konjugats die biologische Aktivitat nach der Transporteinheit-abhangigen intrazellu- 

laren Ablieferung behalt. 

17. Kovalent verknupftes chemisches Konjugat nach Anspruch 16, wobei die Transportpolypeptideinheit aus den Ami- 
nosauren 37 bis 72 des HIV-tat-Proteins (SEQ ID NR: 2) besteht. 

10 

18. Kovalent verknupftes chemisches Konjugat nach Anspruch 17. wobei die Frachteinheit: 

(a) die Aminosauren 245 bis 365 des menschllchen Papillomavirus E2-Proteins (SEQ ID NR: 51); oder 

(b) die Aminosauren 245 bis 365 des menschlichen Papillomavirus E2-Proteins aufweist, wobei die Amino- 
15 sauren 300 und 309 durch Cystein ersetzt wurden (SEQ ID NR: 55). 

19. Kovalent verknupftes chemisches Konjugat nach Anspruch 17, wobei die Frachteinheit eine doppelstrangige DNA 
ist, ausgewahit aus: 

20 (a) Oligonucleotid NF1 , das die Sequenz der SEQ ID NR: 43 aufweist, anellert an Oligonucleotid NF2. das 

die Sequenz der SEQ ID NR: 44 aufweist; und 

(b) Oligonucleotid NF3, das die Sequenz der SEQ ID NR: 45 aufweist. aneliert an Oligonucleotid NF4. das 
die Sequenz der SEQ ID NR: 46 aufweist. 

2S 20. Arzneimlttel, umfassend eine pharmazeutisch wirksame Menge eines Fusionsproteins nach einem der Anspruche 
1 bis 10 Oder eines kovalent verknupften chemischen Konjugats nach einem der Anspruche 16 bis 19. 

21. Verwendung eines Fusionsproteins nach einem der Anspruche 1 bis 10 oder eines kovalent verknupften chemi- 
schen Konjugats nach einem der Anspruche 16 bis 19 zur Herstellung eines Arzneimittels zur intrazellularen Ab- 

30 lieferung eines Frachtmolekuls. 

22. Verfahren zur Herstellung eines Fusionsproteins bestehend aus einer carboxyterminalen Frachteinheit und einer 
aminoterminalen Transporteinheil, gekennzeichnet durch den Schritt einer genetischen Fusion von: 

35 (a) einer Transporteinheit, die gekennzeichnet ist durch: 

(1) das Vorhandensein der Aminosauren 49 bis 57 des HIV-tat-Proteins; 

(ii) das Fehlen der Aminosauren 22 bis 36 des HIV-tat-Proteins; und 

(iii) das Fehlen der Aminosauren 73 bis 86 des HIV-tat-Proteins; und 

40 

(b) einer Frachteinheit, die die biologische Aktivitat nach der Transporteinheit-abhangigen intrazellularen Ab- 
lieferung behalt. 

23. Verfahren nach Anspruch 22, wobei die Frachteinheit ein the rape utisches, prophylaktisches oder diagnostisches 
45 Molekul ist. 

24. Verfahren nach Anspruch 22 oder 23, wobei die Frachteinheit aus dem menschlichen Papiltomavirus E2-Repressor 
besteht und die Transporteinheit: 

50 (a) die Aminosauren 47 bis 58 des HIV-tat-Proteins (SEQ ID NR: 47); 

(b) die Aminosauren 47 bis 72 des HIV-tat-Proteins (SEQ ID NR: 48); 

(c) die Aminosauren 38 bis 72 des HIV-tat-Proteins (SEQ ID NR: 49); oder 

(d) die Aminosauren 38 bis 58 des HIV-tat-Proteins (SEQ ID NR: 50) 

55 aufweist. 

25i Verfahren nach einem der Anspruche 22 bis 24, wobei die Frachteinheit aus den Aminosauren 245 bis 365 des 
menschlichen Papillomavirus E2-Proteins (SEQ ID NR: 51) besteht. 
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26. Verfahren nach Anspruch 25, wobei das Fusionsprotein ausgewahit ist aus JB106, das die Sequenz der SEQ ID 
NR: 38 aufweist, JB117, das die Sequenz der SEQ ID NR: 59 aufweist, JB118, das die Sequenz der SEQ ID NR: 
60 aufweist, und JB122. das die Sequenz der SEQ ID NR: 63 aufweist. 

27. Verfahren nach Anspruch 22 oder 23. wobei die Frachteinheit aus dem Rinderpapiilomavirus E2-Repressor besteht 
und die Transporleinheit: 

(a) die Aminosauren 47 bis 62 des HIV-tat-Proteins (SEQ ID NR: 52); oder 

(b) die Aminosauren 38 bis 62 des HlV-tat-Proteins (SEQ ID NR: 53) 

aufweist. 

28. Verfahren nach einem der Anspruche 22, 23 oder 27, wobei die Frachteinheit ein E2-Repressor bestehend aus 
den Aminosauren 250 bis 410 des Rinderpapiilomavirus E2-Proteins (SEQ ID NR: 56) ist. 

29. Verfahren nach Anspruch 28, wobei das Fusionsprotein JB119, das die Sequenz der SEQ ID NR: 61 aufweist. 
Oder JB120 Ist, das die Sequenz der SEQ ID NR: 62 aufweist. 

30. Verfahren nach Anspruch 22 oder 23. wobei die Frachteinheit aus den Aminosauren 43 bis 41 2 des HSV-VP1 6-Pro- 
teins und die Transporteinheit aus den Aminosauren 47 bis 58 des HIV-tat-Proteins besteht 

31. Verfahren nach einem der Anspruche 22 bis 30. wobei der Transporteinheit ein ami note rminales Methionin vor- 
ausgeht. 

32. Verfahren zur Herstellung eines DNA-Molekuls. umfassend eine Nucleotidsequenz, die ein Fusionsprotein codiert, 
das aus einer carboxyterminalen Frachteinheit und einer aminoterminalen Transporteinheit besteht, umfassend 
den Schritt der Einbringung einer Nucleotidsequenz, die ein Fusionsprotein codiert, das mittels der Melhode nach 
Anspruch 26 oder 29 hergestellt ist, in ein Plasmid. 

33. Verfahren zur Herstellung eines DNA-Molekuls, umfassend eine Nucleotidsequenz, die ein Fusionsprotein codiert, 
das aus einer carboxyterminalen Frachteinheit und elher aminoterminalen Transporteinheit besteht, umfassend 
den Schritt der Einbringung einer Nucleotidsequenz, die das Fusionsprotein tat-VP16R.GF codiert, das die Se- 
quenz der SEQ ID NR: 58 aufweist, in ein Plasmid. 

34. Verfahren nach Anspruch 32 oder 33, wobei die Nucleotidsequenz, die das Fusbnsprotein codiert, funktionell mit 
Expressionskontrollsequenzen verbunden ist. 

35. Verfahren zur Transformation eines elnzelllgen WIrts, umfassend den Schritt der Einbringung eines DNA-Molekuls, 
das mittels der Methode nach Anspruch 34 hergestellt ist, in den Wirt. 

36. Verfahren zur Herstellung eines Fusionsproteins nach einem der Anspruche 26, 29 oder 30. umfassend die Schrit- 
te: 

(a) Zuchtung eines transform ierten elnzelllgen WIrts, der mittels der Methode nach Anspruch 35 hergestellt 
ist; und 

(b) Gewinnung des Fusionsproteins aus der Kultur. 

37. Verfahren zur Herstellung eines kovalent verknupften chemischen Konjugats, das aus einer Transportpotypeptid- 
einhell und einer Frachteinheit besteht, umfassend den Schritt der Verknupfung von: 

(a) einer Transportpolypeptideinheit, die gekennzeichnet ist durch: 

(1) das Vorhandensein der Aminosauren 49 bis 57 des HIV-tat-Proteins; 

(ii) das Fehlen der Aminosauren 22 bis 36 des HIV-tat-Protelns; und 

(iii) das Fehlen der Aminosauren 73 bis 86 des HIV-tat-Proteins; und 

(b) einer Frachteinheit, die die biologlsche Aktivitat nach der Transporteinheit-abhangigen intrazellularen Ab- 
lieferung behalt. 
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38. Verfahren nach Anspruch 37, wobei die Transportpolypeptideinheit aus den Aminosauren 37 bis 72 des HlV-tat- 
Proteins (SEQ ID NR: 2) besteht. 

39. Verfahren nach Anspruch 38, wobei die Frachteinheit: 

5 

(a) die Aminosauren 245 bis 365 des menschlichen Papillomavirus E2-Proteins (SEQ ID NR: 51); oder 

(b) die Aminosauren 245 bis 365 des menschlichen Papillomavirus E2-Proteins aufweist, wobei die Amino- 
sauren 300 und 309 durch Cystein ersetzt wurden (SEQ ID NR: 55). 

10 40. Verfahren nach Anspruch 38, wobei die Frachteinheit eine doppelstrangige DNA ist, ausgewahit aus: 

(a) Oligonucleotid NF1, das die Sequenz der SEQ ID NR: 43 aufweist, aneliert an Oligonucleotid NF2, das 
die Sequenz der SEQ ID NR: 44 aufweist; und 

(b) Oligonucleotid NF3, das die Sequenz der SEQ ID NR: 45 aufweist, aneliert an Oligonucleotid NF4, das 
IS die Sequenz der SEQ ID NR: 46 aufweist. 

41. Verfahren zur Herstellung eines Arznelmittels, umfassend eine pharmazeutisch wirksame Menge eines Fusions- 
proteins, das minels des Verfahrens nach einem der Anspruche 22 bis 31 hergesteitt ist, oder eines kovalent 
verknupften chemischen Konjugals, das mittels des Verfahrens nach einem der Anspruche 37 bis 40 hergeslellt 
20 ist, wobei das Fuslonsprotein oder das kovalent verknupfte chemische Konjugat mit einem pharmazeutisch ver- 

traglichen Trager fomnuliert ist. 

Revendications 

2S 

1. Proteine de fusion consistent en un fragment de chargement a terminaison carboxy et un fragment de transport 

a terminaison amino, dans laquelle 

(a) le fragment de transport est caracterise par : 

30 

(i) la presence des acides amines 49-57 de la proteine tat du VI H; 

(ii) rabsence des acides amin^ 22-36 de la proteine tat du VIH ; et 

(iii) rabsence des acides amines 73-86 de la proteine tat du VIH ; et 

35 (b) le fragment de chargement conserve I'activite biologique a la suite de la distribution intracellulaire depen- 

dant du fragment de transport. 

2. Proteine de fusion selon la revendication 1 , dans laquelle le fragment de chargement est choisi parmi des nnol^ules 
therapeutiques, des molecules prophylactiques et des molecules de diagnostic. 

40 

3. Proteine de fusion selon la revendication 1 ou 2. dans laquelle le fragment de chargement consiste en le represseur 
E2 du papillomavirus humain et le fragment de transport est choisi parmi : 

(a) les acides amines 47-58 de la proteine tat du VIH (SEQ ID NO: 47) 
46 (b) les acides amines 47-72 de la proteine tat du VIH (SEQ ID NO: 48) 

(c) les acides amines 38-72 de la proteine tat du VIH (SEQ ID NO: 49) ; et 

(d) les acides amines 38-58 de la proteine tat du VIH (SEQ ID NO:50). 

4. Proteine de fusion selon I'une quelconque des revendications 1 a 3, dans laquelle le fragment de chargement 
50 consiste en les acides amines 245-365 de la proteine E2 de papillomavirus humain (SEQ ID NO: 51 ). 

5. Proteine de fusion selon la revendication 4 choisie parnii JB106 ayant la SEQ ID NO: 38, JB117 ayant la SEQ ID 
NO: 59. JB118 ayant la SEQ ID NO: 60 et JB122 ayant la SEQ ID NO: 63. 

55 6. Proteine de fusion selon la revendication 1 ou 2, dans laquelle le fragment de chargement consiste en un represseur 
E2 de papillomavirus bovin et le fragment de transport est choisi parmi : 

(a) les acides amines 47-62 de la prot6ine tat du VIH (SEQ ID NO: 52) ; et 
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(b) les acides amines 38-62 de la proteine tat du ViH (SEQ ID NO: 53). 

7. Proteine de fusion selon I'une quelconque des revendications 1 , 2 ou 6, dans laquelle le fragment de chargement 
est un represseur E2 consistant en les acides amines 250-410 de la proteine E2 du papillomavirus bovin (SEQ 

5 ID NO: 56). 

8. Proteine de fusion selon la revendication 7, qui est JB1 1 9 ayant la SEQ ID NO: 61 ou JB1 20 ayant la SEQ I D NO:62. 

9. Proteine de fusion selon la revendication 1 ou 2, dans laquelle le fragment de chargement consiste en les acides 
10 amines 43-41 2 de la proteine VP16 de HSV et le fragment de transport consiste en les acides amines 47-58 de 

la proteine tatdu VIH. 

10. Proteine de fusion sebn I'une quelconque des revendications 1 a 9, dans laquelle le fragment de transport est 
precede par une methionine a terminaison amino. 

IS 

11 . Molecule d'ADN comprenant une sequence de nucleotides encodant une proteine de fusion selon la revendication 

Sou 8. 

12. Molecule d'ADN comprenant une sequence de nucleotides encodant la proteine de fusion tat-VP16R.GF ayant la 
20 SEQ ID NO: 58. 

13. Molecule d'ADN selon la revendication 11 ou 12, dans laquelle la sequence de nucleotides encodant la proteine 
de fusion est liee operationnellement aux sequences de contrdle d'expression. 

25 14. Hote monocellulaire transforme avec une molecule d'ADN selon la revendication 13. 

15. Procede de production d'une proteine de fusion selon I'une quelconque des revendications 5, 8 ou 9 comprenant 
les 6tapes de : 

30 (a) culture d'un hote monocellulaire transforme selon la revendication 13 ; et 

(b) recuperation de la proteine de fusion a partir de ladite culture. 

16. Conjugue chimique lie de tagon covalente consistant en un fragment de polypeptide de transport et en un fragment 
de chargement, dans lequel : 

35 ... 

(a) le fragment de polypeptide de transport du conjugue est caracterise par : 

(i) la presence des acides amines 49-57 de la proteine tat du VIH; 

(ii) I'absence des acides amines 22-36 de la proteine tat du VIH ; et 
40 (ill) I'absence des acides amines 73-86 de la proteine tat du VIH ; et 

(b) le fragment de chargement du conjugue conserve Tactivit^ biologique ^ la suite de la distribution intracel- 
lulaire dependant du fragment de transport. 

45 17. Conjugue chimique lie de fagon covalente selon la revendication 16, dans lequel le fragment de polypeptide de 
transport consiste en les acides amines 37-72 de la proteine tat du VIH (SEQ ID NO: 2). 

18. Conjugue chimique lie de fagon covalente sebn la revendication 17. dans lequel le fragment de chargement est 
choisi parmi : 

50 

(a) les acides amines 245-365 de la proteine E2 du papillomavirus humain (SEQ ID NO: 51) et 

(b) les acides amines 245-365 de la proteine E2 du papillomavirus humain, dans lesquels les acides amines 
300 et 309 ont ete changes en cysteine (SEQ ID NO: 55). 

S5 19. Conjugue chimique lie de fagon covalente selon la revendication 17, dans lequel le fragment de chargement est 
un ADN a double brin choisi parmi 

(a) I'oligonucleotide NF1 ayant la SEQ ID NO: 43 accole a I'oligonucleotide NF2 ayant la SEQ ID NO: 44 et 
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(b) i'oligonucleotide NF3 ayant la SEQ ID NO: 45 accole a I'oligonucteotide NF4 ayant la SEQ ID NO: 46. 

20. Composition pharmaceutique comprenant une quantite pharmaceutiquement efficace d*une proteine de fusion 
selon Tune quelconque des revendicatlons 1 ^ 10 ou d'un conjugu6 chimique U6 de fagon covalente sebn i'une 
quelconque des revendications 16 a 19. 

21. Utilisation d'une proteine de fusion seion Tune quelconque des revendications 1 a 10 ou d'un conjugue chimique 
116 de fagon covalente selon I'une quelconque des revendications 16 & 19 pour la preparation d'une composition 
pharmaceutique pour la distribution intracellulaire d'un chargement. 

22. Precede de production d'une proteine de fusion consistant en un fragment de chargement a terminaison cartx)xy 
et en un fragment de transport a terminaison amino, caracterise par Tetape de fusion genetiquement de 

(a) un fragment de transport qui est caracterise par : 

(i) la presence des acides amines 49-57 de la proteine tat du VI H; 

(ii) I'absence des acides amines 22-36 de la proteine tat du VIH ; et 

(iii) I'absence des acides amines 73-86 de la proteine tat du VIH ; et 

(b) un fragment de chargement qui consewe I'activite biologique k la suite de la distribution intracellulaire 
dependant du fragment de transport. 

23. Precede selon la revendication 22, dans lequel le fragment de chargement est choisi parmi des molecules thera- 
peutiques, des molecules prophylactiques et des molecules de diagnostic. 

24. Proc6d6 selon la revendication 22 ou 23, dans lequel le fragment de chargement consiste en le represseur E2 du 
papillomavirus humain et le fragment de transport est choisi parmi: 

(a) les acides amines 47-58 de ia proteine tat du VIH (SEQ ID NO: 47) 

(b) les acides amines 47-72 de la proteine tat du VIH (SEQ ID NO: 48) 

(c) ies acides amines 38-72 de la proteine tat du VIH (SEQ ID NO: 49) ; et 

(d) les acides amines 38-58 de la proteine tat du VIH (SEQ ID NO:50). 

25. Precede selon I'une quelconque des revendications 22 a 24, dans lequel le fragment de chargement consiste en 
les acides amines 245-365 de la proteine E2 du papillomavirus humain (SEQ ID NO: 51). 

26. Procede selon ia revendication 25, dans lequel ladite proteine de fusion est choisie parmi JB106 ayant la SEQ ID 
NO: 38. JB117 ayant la SEQ ID NO: 59, JB118 ayant la SEQ ID NO: 60 et JB122 ayant la SEQ ID NO: 63. 

27. Proc6d6 sebn la revendication 22 ou 23, dans lequel le fragment de chargement consiste en le r6presseur E2 du 
papillomavirus bovin et le fragment de transport est choisi pamii : 

(a) les acides amines 47-62 de la proteine tat du VIH (SEQ ID NO: 52) ; et 

(b) les acides amines 38-62 de la proteine tat du VIH (SEQ ID NO: 53). 

28. Procede selon I'une quelconque des revendications 22, 23 ou 27, dans lequel le fragment de chargement est un 
represseur E2 consistant en les acides amines 250-41 0 de ia proteine E2 du papillomavirus bovin (SEQ ID NO: 56). 

29. Procede selon la revendication 28. dans lequel ladite proteine de fusion est JB119 ayant la SEQ ID NO: 61 ou 
JB120 ayant la SEQ ID NO: 62. 

30. Proced6 selon la revendication 22 ou 23. dans lequel le fragment de chargement consiste en les acides amin§s 
43-41 2 de la proteine VP1 6 de HSV el le fragment de transport consiste en les acides amines 47-58 de la proteine 
tatdu VIH. 

31. Procede selon I'une quelconque des revendications 22 a 30, dans lequel le fragment de transport est precede par 
une methionine a terminaison amino. 
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32. Precede de production d'une molecule d'ADN comprenanl une sequence de nucleotides encodant una proteine 
de fusion consistant en un fragment de chargement a terminaison carboxy et un fragment de transport ^ terminal- 
son amino comprenant i'etape d'introduire dans un plasmide une sequence de nucleotides encodant une proteine 
de fusion produite par le proc6d6 selon la revendication 26 ou 29. 

5 

33. Procede de production d'une molecule d'ADN comprenanl une sequence de nucleotides encodant une proteine 
de fusion consistant en un fragment de chargement a terminaison carboxy et un fragment de transport a terminai- 
son amino comprenant l'6tape d'introduire dans un plasmide une sequence de nucleotides encodant la proteine 
de fusion tat-VP16R.GF ayant la SEQ ID NO: 58. 

10 

34. Procede selon la revendication 32 ou 33, dans lequel la sequence de nucleotides encodant la proteine de f usbn 
est liee operationnellement aux sequences de controle d'expression. 

35. Proc^6 de transformation d'un hote unicellulaire comprenant I'etape d'introduire dans ledit bote une molecule 
^5 d'ADN produite par le procede selon la revendication 34. 

36. Procede de production d'une proteine de fusion selon Tune queiconque des revendications 26, 29 ou 30 compre- 
nant les 6tapes de : 

20 (a) culture d'un hdte nrtonocetiulaire transforme produit par le procede selon la revendication 35 ; et 

(b) recuperation de la proteine de fusion a partir de ladite culture. 

37. Procede de production d'un conjugue chimique lie de fagon covaiente consistant en un fragment de polypeptide 
de transport et un fragment de chargement, comprenant I'etape de liaison de : 

25 

' (a) un fragment de polypeptide de transport qui est caracterise par : 

(i) la presence des acides amines 49-57 de la proteine tat du VI H; 

(ii) I'absence des acides amines 22-36 de la proteine tat du ViH ; et 
30 (ijj) I'absence des acides amines 73-86 de la proteine tat du VIH ; et 

(b) un fragment de chargement qui conserve I'activite biologique h la suite de la distribution intracellulaire 
dependant du fragment de transport. 

35 38. Procede selon la revendication 37, dans lequel le fragment de polypeptide de transport consiste en les acides 
amines 37-72 de la protdine tat du VIH (SEQ ID NO: 2). 

39. Proc6d6 selon la revendication 38, dans lequel le fragment de chargement est choisi parmi : 

^0 (a) les acides amines 245-365 de la proteine E2 du papillomavirus humain (SEQ ID NO: 51 ) et 

(b) les acides amines 245-365 de la proteine E2 du papillomavirus humain, dans lesquels les acides amines 
300 et 309 ont 6X6 changes en cysteine (SEQ ID NO: 55). . 

40. Procede selon la revendication 38, dans lequel le fragment de chargement est un ADN a double brin choisi parmi : 

45 

(a) roligonucl6otide NF1 ayant la SEQ ID NO: 43 accole ^ I'oligonucleotide NF2 ayant la SEQ ID NO: 44 et 

(b) I'oligonucleotide NF3 ayant la SEQ ID NO: 45 accole a I'oligonucleotide NF4 ayant la SEQ ID NO: 46. 

41. Procede de preparation d'une composition pharmaceutlque comprenant une quantite pharmaceutiquement effi- 
50 cace d'une proteine de fusion produite par le procede selon Tune queiconque des revendications 22 a 31 ou d'un 

conjugue chimique lie de fa9on covaiente produit par le procede selon Tune queiconque des revendications 37 a 
40, dans lequel ladite proteine de fusion ou ledit conjugue chimique lie de fagon covaiente est formule avec un 
vehicule pharmaceutiquement acceptable. 
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FIG. 1 



Met Glu Pro Vai Asp Pro Arg Leu Glu Pro Trp Lys His Pro Gly 
15 10 15 



Ser Gin Pro Lys Thr Ala Cys Thr Asn Cys Tyr Cys Lys Lys Cys 
20 25 30 



Cys Phe His Cys Gin Val Cys Phe lie Thr Lys Ala Leu Gly lie 
35 40 45 



Ser Tyr Gly Arg Lys Lys Arg Arg Gin Arg Arg Arg Pro Pro Gin 
SO 55 60 



Gly Ser Gin Thr His Gin Val Ser Leu Ser Lys Gin Pro Thr Ser 
65 70 75 



Gin Ser Arg Gly Asp Pro Thr Gly Pro Lys Glu 
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